This page is no longer being updated as it has been relocated to http://www.mcs.vuw.ac.nz/~elvis/db/FishBrainWiki?Excel.

Definition

A description of what is possible in spreadsheets will go here.
Given this scope/latitude, how do people use the space?

Example Formulas

Grammer

The work on the Grammer here is mainly targeted at the structure of the Excel sheets (cells references etc).
To parse a formula string into a List of tokens in RPN order something like the following would be used. (Inspired by "Lets Build a Compiler", by Jack Crenshaw and extracted from source code of POI>HSSF.)
BNF [BNF] for the formula expression is :
<expression> ::= <term> [<addop> <term>]*
<term> ::= <factor> [ <mulop> <factor> ]*
<factor> ::= <number> | (<expression>) | <cellRef> | <function>
<function> ::= <functionName> ([expression [, expression]*])

Formula

Formula ::= "=" <a combination of references, operators, functions, numbers and other primitives.>
Formulas can refer to cells or ranges of cells, or to names or labels that represent cells or ranges

ReferencePrefix ::= (<Workbook> | <Sheet> | &epsilon ) <Reference>
Reference ::= <Cell> | <AreaReference>
AreaReference ::= <Range> | <Vector> | <Intersection> | <Union>
Workbook ::= "["workbook_path/filename.xls"]" <Sheet>
Sheet ::= <Sheetname>"!"
Sheetname ::= <The name of each worksheet.> e.g. Sheet1, Sheet2, Sheet3, ...
Cell ::= ["$"]<Column>["$"]<Row>
The $ symbol denotes an absolute reference to that dimension/axis.
Row ::= x | x ε Int, 0 < x ≤ 65536
Column ::= [A-Z] | [A-H][A-Z] | I[A-Z]
Here [A-Z] is interupted like a regular expression i.e. Any character between 'A' and 'Z'. Columns use letters (A through IV, for a total of 256 columns) for labels.

AreaReference operators

Range ::= Cell:Cell
Vector ::= <ColumnRange> | <RowRange>
It is possiblie to refer to a single column(B:B) or row(5:5) range (known as a vector) in Excel .
ColumnRange ::= Column:Column
The first column reference should be ≤ to the second column reference.
RowRange ::= Row:Row
The first row reference should be ≤ to the second row reference.
Intersection ::= AreaReference AreaReference
Two ranges seperated by a single space.
Union ::= AreaReference,AreaReference
Reference operators combine ranges of cells for calculations.

Operators

Operator ::= ComparrisonOperator | ArithmaticOperator | ConcatinationOperator | OtherOperator
ComparrisonOperator ::= "<=" | ">=" | "<" | ">" | "!=" | "="
ArithmaticOperator ::= "+" | "-" | "*" | "/" | "^"
ConcatinationOperator ::= "&"
OtherOperator ::= "%"

Functions

To get list of functions in catagory. Find in help for "about function". Cut and paste list of functions to Word and do replace using ^wworksheet function^p with | Then do replace using ^t with nothing
Function ::= (Function_LOGICAL | Function_FINANCIAL | Function_STATISTICAL | Function_MATHANDTRIG | Function_DATABASE | Function_DATEANDTIME | Function_ENGINEERING | Function_TEXT | Function_INFORMATION | Function_LOOKUPANDREFERENCE)
Function_LOOKUPANDREFERENCE ::= ("ADDRESS" | "AREAS" | "CHOOSE" | "COLUMN" | "COLUMNS" | "HLOOKUP" | "HYPERLINK" | "INDEX" | "INDIRECT" | "LOOKUP" | "MATCH" | "OFFSET" | "ROW" | "ROWS" | "TRANSPOSE" | "VLOOKUP")
Function_INFORMATION ::= ("CELL" | "COUNTBLANK" | "ERROR.TYPE" | "INFO" | "ISBLANK" | "ISERR" | "ISERROR" | "ISEVEN" | "ISLOGICAL" | "ISNA" | "ISNONTEXT" | "ISNUMBER" | "ISODD" | "ISREF" | "ISTEXT" | "N" | "NA" | "TYPE")
Function_FINANCIAL ::= ("ACCRINT" | "ACCRINTM" | "AMORDEGRC" | "AMORLINC" | "COUPDAYBS" | "COUPDAYS" | "COUPDAYSNC" | "COUPNCD" | "COUPNUM" | "COUPPCD" | "CUMIPMT" | "CUMPRINC" | "DB" | "DDB" | "DISC" | "DOLLARDE" | "DOLLARFR" | "DURATION" | "EFFECT" | "FV" | "FVSCHEDULE" | "INTRATE" | "IPMT" | "IRR" | "MDURATION" | "MIRR" | "NOMINAL" | "NPER" | "NPV" | "ODDFPRICE" | "ODDFYIELD" | "ODDLPRICE" | "ODDLYIELD" | "PMT" | "PPMT" | "PRICE" | "PRICEDISC" | "PRICEMAT" | "PV" | "RATE" | "RECEIVED" | "SLN" | "SYD" | "TBILLEQ" | "TBILLPRICE" | "TBILLYIELD" | "VDB" | "XIRR" | "XNPV" | "YIELD" | "YIELDDISC" | "YIELDMAT")
Function_ENGINEERING ::= ("BESSELI" | "BESSELJ" | "BESSELK" | "BESSELY" | "BIN2DEC" | "BIN2HEX" | "BIN2OCT" | "COMPLEX" | "CONVERT" | "DEC2BIN" | "DEC2HEX" | "DEC2OCT" | "DELTA" | "ERF" | "ERFC" | "GESTEP" | "HEX2BIN" | "HEX2DEC" | "HEX2OCT" | "IMABS" | "IMAGINARY" | "IMARGUMENT" | "IMCONJUGATE" | "IMCOS" | "IMDIV" | "IMEXP" | "IMLN" | "IMLOG10" | "IMLOG2" | "IMPOWER" | "IMPRODUCT" | "IMREAL" | "IMSIN" | "IMSQRT" | "IMSUB" | "IMSUM" | "OCT2BIN" | "OCT2DEC" | "OCT2HEX")
Function_DATEANDTIME ::= ("DATE" | "DATEVALUE" | "DAY" | "DAYS360" | "EDATE" | "EOMONTH" | "HOUR" | "MINUTE" | "MONTH" | "NETWORKDAYS" | "NOW" | "SECOND" | "TIME" | "TIMEVALUE" | "TODAY" | "WEEKDAY" | "WORKDAY" | "YEAR" | "YEARFRAC")
Function_DATABASE ::= ("DAVERAGE" | "DCOUNT" | "DCOUNTA" | "DGET" | "DMAX" | "DMIN" | "DPRODUCT" | "DSTDEV" | "DSTDEVP" | "DSUM" | "DVAR" | "DVARP" | "GETPIVOTDATA")
Function_MATHANDTRIG ::= ("ABS" | "ACOS" | "ACOSH" | "ASIN" | "ASINH" | "ATAN" | "ATAN2" | "ATANH" | "CEILING" | "COMBIN" | "COS" | "COSH" | "COUNTIF" | "DEGREES" | "EVEN" | "EXP" | "FACT" | "FACTDOUBLE" | "FLOOR" | "GCD" | "INT" | "LCM" | "LN" | "LOG" | "LOG10" | "MDETERM" | "MINVERSE" | "MMULT" | "MOD" | "MROUND" | "MULTINOMIAL" | "ODD" | "PI" | "POWER" | "PRODUCT" | "QUOTIENT" | "RADIANS" | "RAND" | "RANDBETWEEN" | "ROMAN" | "ROUND" | "ROUNDDOWN" | "ROUNDUP" | "SERIESSUM" | "SIGN" | "SIN" | "SINH" | "SQRT" | "SQRTPI" | "SUBTOTAL" | "SUMIF"+")" | "("+"SUM" | "SUMPRODUCT" | "SUMSQ" | "SUMX2MY2" | "SUMX2PY2" | "SUMXMY2" | "TAN" | "TANH" | "TRUN")
Function_STATISTICAL ::= ("AVEDEV" | "AVERAGE" | "AVERAGEA" | "BETADIST" | "BETAINV" | "BINOMDIST" | "CHIDIST" | "CHIINV" | "CHITEST" | "CONFIDENCE" | "CORREL" | "COUNT" | "COUNTA" | "COVAR" | "CRITBINOM" | "DEVSQ" | "EXPONDIST" | "FDIST" | "FINV" | "FISHER" | "FISHERINV" | "FORECAST" | "FREQUENCY" | "FTEST" | "GAMMADIST" | "GAMMAINV" | "GAMMALN" | "GEOMEAN" | "GROWTH" | "HARMEAN" | "HYPGEOMDIST" | "INTERCEPT" | "KURT" | "LARGE" | "LINEST" | "LOGEST" | "LOGINV" | "LOGNORMDIST" | "MAX" | "MAXA" | "MEDIAN" | "MIN" | "MINA" | "MODE" | "NEGBINOMDIST" | "NORMDIST" | "NORMINV" | "NORMSDIST" | "NORMSINV" | "PEARSON" | "PERCENTILE" | "PERCENTRANK" | "PERMUT" | "POISSON" | "PROB" | "QUARTILE" | "RANK" | "RSQ" | "SKEW" | "SLOPE" | "SMALL" | "STANDARDIZE" | "STDEV" | "STDEVA" | "STDEVP" | "STDEVPA" | "STEYX" | "TDIST" | "TINV" | "TREND" | "TRIMMEAN" | "TTEST" | "VAR" | "VARA" | "VARP" | "VARPA" | "WEIBULL" | "ZTEST")
Function_TEXT ::= ("CHAR" | "CLEAN" | "CODE" | "CONCATENATE" | "DOLLAR" | "EXACT" | "FIND" | "FIXED" | "LEFT" | "LEN" | "LOWER" | "MID" | "PROPER" | "REPLACE" | "REPT" | "RIGHT" | "SEARCH" | "SUBSTITUTE" | "T" | "TEXT" | "TRIM" | "UPPER" | "VALUE")
Function_LOGICAL ::= ("IF" | "AND" | "OR" | "NOT" | "TRUE" | "FALSE");


Markus Clermont's Grammer

Spreadsheet ::= Row, {col, Row}, col
Row ::=         (Formula|Number|string), {;,Row}
Number ::=      ['-'], digit, {digit}, [',', digit, {digit}]
Formula ::=     '=', FormulaExpression
FormulaExpression ::= Expression, {('+'|'-'|'*'|'/'|'^'), Expression}
CellAddress ::= 'R',('$' AbsCord)|RelCord, 'C',('$' AbsCord)|RelCord
RelCord ::=     ['-'],digit,{digit}
AbsCord ::=     digit, {digit}
LogOp ::=       '>' | '<' | '=' | '!=' | '>=' | '<='
LogicalExpression ::= FormulaExpression [LogOp FormulaExpression]
LogArgList ::=  LogicalExpression ';' LogicalExpression
FunctionCall ::= ('IF(',LogicalExpression, ';', FormulaExpression, [';',FormulaExpression],')' |
                 (GroupFkt'(',RArgList,')') |
                 ('NOT(', LogicalExpression, ')') |
                 (Function, '(', ArgList ')') |
                 (LogFunc'(', LogArgList ')') |
                 ('SVerweis(', FormulaExpression, ';', CellAddress, [':', CellAddress], ';', FormulaExpression, ';' FormulaExpression, ')')
GroupFkt ::=     'SUM'|'MAX'|'MIN'|...
Function ::=     ...
LogFunc ::=      'Or'|'And'|...
ArgList ::=      FormulaExpression, {';', FormulaExpression}
RArgList ::=     RArgument, [';' RArgument]
RArgument ::=    RangeReference | FormulaExpression
RangeRefernece ::= CellAddress, ':', CellAddress

Operator Precedence

Based on Excel Tutorial - SC 2000 - Basic Math (local). and Microsoft support.

When you combine several operators into a single formula, Microsoft Excel performs the operations in the following order:

In Excel 97 Help:
operators, evaluation order in formulas > "The order in which Microsoft Excel performs operations in formulas" topic
operators, precedence...

Note: (From http://www.faqs.org/faqs/spreadsheets/faq/)
10.13 Why is =-1^2 positive and =0-1^2 negative?

nonstandard operator precedence -- the `unary minus' has a high precedence, as normal, but the `exponentiation operator' has a higher one, which is NOT normal.

Excel treats -1^2 as (-1)^2, while most languages would as -(1^2). (you can ALWAYS use parentheses to force either interpretation.)

note that Excel formulas and VBA formulas disagree on the order.

Calculation operators in formulas

(Based on [WSFF] and help files from [EXCEL])
Operators specify the type of calculation that you want to perform on the elements of a formula. Microsoft Excel includes four different types of calculation operators: arithmetic, comparison, text, and reference.

"Although the multidensional spreadsheet has attracted a lot of interest since it empowers the end user to analyze business data, this has not replaced traditional analysis by means of a managed query environment." - [COMP442 ODWOT]

Links

Consider using a parser generator [JavaCC] or JLex to automatically create code directly from the definition.
[SMSSE] [WFE] [MRX] [UYCS] [ECS] [FD] [DSGPASL] [WSSF]

Metrics

FishBrain :: SpreadsheetMetrics For a formula:

Help Stuff from XP

You can use Auto Fill Options Auto Fill Options to choose options for how to fill the selection. For example, you can choose to Fill Formatting Only or Fill Without Formatting.

If a cell contains a formula that breaks one of the rules, a triangle appears in the top-left corner of the cell. You can prevent these indicators from being displayed. formula problem Clicking in these cells will display the trace error icon. trace error icon

Error detection rules

Formula omits cells in region
The formula may not include a correct reference. If a formula refers to a range of cells, and you add cells to the bottom or right of that range, the references may no longer be correct. The formula does not always automatically update its reference to include the new cells. This rule compares the reference in a formula against adjacent cells. If the adjacent cells contain more numbers (are not blank cells), then the problem is noted.
For example, the formula =SUM(A2:A4) would be noted with this rule, because A6, A7, and A8 are adjacent, and contain data.

Invoice
15,000
9,000
8,000
20,000
5,000
22,500
=SUM(A2:A4)

Errors, Checkers, Auditors

FishBrain::SpreadsheetAuditing
(From http://www.faqs.org/faqs/spreadsheets/faq/)

Auditor tools and books The Excel Auditor [XLAUD]
Spreadsheet Detective [SSD]
Professional's Guide to Robust Spreadsheets http://www.manning.com/Richardson/

Error value
Meaning
#DIV/0!
The formula or function is trying a division by 0. Which is mathematically undetermined. Check for values = 0. One of the references in the formula may be to a blank or empty cell. In this case, change the reference or enter a value other than zero.
#N/A
Refers to a value that is not available, or non existing ( a blank cell for instance)
#NAME?
Uses a name that EXCEL does not regognize.
#NULL!
Specifies an invalid intersection of two areas.
#NUM!
Uses a number incorrectly for instant in the argument of a function. It is also displayed if the numbers are too big or too small to be understood by EXCEL.
#REF!
Refers to a cell that is not valid (non existing cell for instance).
#VALUE!
Uses an incorrect argument or operant.
#####
Produces a result too long to be displayed in the cell. This is not really an error: it indicates the the cell need to be wider or of a different format.

Notes and stuff

Absolute Column, Absolute Row, $A$1, is an absolute cell reference
Relative Column, Absolute Row, A$1, is a mixed cell reference
Absolute Column, Relative Row, $A1, is a mixed cell reference
Relative Column, Relative Row, A1, is a relative cell reference

Pressing the F4 key in Excel, while typing the cell address, will cycle it through the four variations.

Google discussion group: http://groups.google.com/groups?group=comp.apps.spreadsheets

"The shortcut key to cell names is Ctrl + F3." [OOPS]

To make Excel calculate a circular reference, set iteration to on. Use Tools|Options|Calculation|Iteration. Also set the Maximum Iterations number to 50 or more.

Extract from http://groups.google.com
Subject:
 Formula- Grammar
Newsgroup:
 comp.apps.spreadsheets
Harlan Grove (hrlngrv@aol.com)
Subject: Re: Formula- Grammar
Newsgroups: comp.apps.spreadsheets
View this article only
Date: 2001-09-06 11:44:32 PST

Markus CLERMONT  wrote...
>I'm doing currently some research in debugging excel-spreadsheets.
>As we are currently developping a tool for detecting errors in
>spreadsheets, we have to parse Excel-Formulas. Therefore I'd
>need a formula-grammar for excel-formulas.

Meaning a BNF-style grammar? I don't think there is one. For the most part
it's very basic: 5 infix arithmetic operators (+, -, *, /, ^), 6 infix
comparison operators (=, <>, <, <=, >, >=), locale-varying argument
separator and decimal 'point' (usually , or ; and . or ,) taken from
Windows' Regional Settings (that alone would make a BNF grammar rather a
tricky thing to write), numeric constants, string constants, error constants
(all begin with #), function calls as function-name(argument-list), and then
there's the harder stuff.

- range references, including collections, intersections and 3D blocks
- array constants
- whitespace in limited instances
- no more than 7 levels of formula nesting
- no argument list longer than 29 arguments
- anything not interpreted as one of the items above that matches the
regular expression [_A-Za-z][_0-9A-Za-z]* but not 'R' or 'C' would be
interpreted as defined names

Not sure why you need this. Excel won't allow users to enter syntactically
invalid formulas, so you'll never find any cell formulas in Excel workbooks
for which you'd need a grammar to check them. If the tool you're developing
would locate argument type errors in function calls within cell formulas,
that's problematic. In may instances Excel can use numeric and text values
interchangeably, like awk or perl.

So likely all you really need is a means of identifying function calls
within formulas. The syntax isn't materially different from C. Function
names match the regular expression [A-Za-z][_0-9A-Za-z]* (no restrictions,
but several built-in functions), function names are immediately followed by
left parenthesis, then there's the argument list, and finally the closing
right parenthesis. So the regular expression [A-Za-z][_0-9A-Za-z]*[(]
(assuming you're not inside a double-quoted string) matches the beginning of
any function call.

About R1C1 references

Ref. To toggle between A1 and R1C1 style referencing Tools > Options > General tab > R1C1 reference style.
In R1C1 style, Microsoft Excel indicates the location of a cell with an "R" followed by a row number and a "C" followed by a column number. For example, the absolute cell reference R1C1 is equivalent to the absolute reference $A$1 in A1 reference style. If the active cell is A1, the relative cell reference R[1]C[1] refers to the cell one row down and one column to the right, or B2. The following are examples of references in R1C1 style.

Reference       Meaning
R[-2]C	        A relative reference to the cell two rows up and in the same column
R[2]C[2]        A relative reference to the cell two rows down and two columns to the right
R2C2	        An absolute reference to the cell in the second row and in the second column
R[-1]	        A relative reference to the entire row above the active cell
R	        An absolute reference to the current row


UML Spreadsheet Strucutre

Spreadsheet Structure

Internal Format