UPDATE: Dump of initial files

This commit is contained in:
Nicolás Hatcher
2023-11-18 21:26:18 +01:00
commit c5b8efd83d
279 changed files with 42654 additions and 0 deletions

View File

@@ -0,0 +1,64 @@
Documentation
=============
An `xlsx` is a zip file containing a set of folders and `xml` files. The IronCalc json structure mimics the relevant parts of the Excel zip.
Although the xlsx structure is quite complicated, it's essentials regarding the spreadsheet technology are easier to grasp.
The simplest workbook folder structure might look like this:
```
docProps
app.xml
core.xml
_rels
.rels
xl
_rels
workbook.xml.rels
theme
theme1.xml
worksheets
sheet1.xml
calcChain.xml
styles.xml
workbook.xml
sharedStrings.xml
[Content_Types].xml
```
Note that more complicated workbooks will have many more files and folders.
For instance charts, pivot tables, comments, tables,...
The relevant json structure in IronCalc will be:
```json
{
"name": "Workbook1",
"defined_names": [],
"shared_strings": [],
"worksheets": [],
"styles": {
"num_fmts": [],
"fonts": [],
"fills": [],
"borders": [],
"cell_style_xfs": [],
"cell_styles" : [],
"cell_xfs": []
}
}
```
Note that there is not a 1-1 correspondence but there is a close resemblance.
SpreadsheetML
-------------
International standard (Four edition 2016-11-01): ECMA-376, ISO/IEC 29500-1
* [iso](https://standards.iso.org/ittf/PubliclyAvailableStandards/c071691_ISO_IEC_29500-1_2016.zip)
* [ecma](http://www.ecma-international.org/publications/standards/Ecma-376.htm)

View File

@@ -0,0 +1,67 @@
Shared Strings
==============
In Excel the type of a cell that contains a string can be of one of three cases:
(see section 18.18.11 ST_CellType (Cell Type))
* 's' (A shared string)
* 'str' (A formula string)
* 'inlineStr' (An inline string)
This file holds a list of the shared strings. The following example contains two strings:
* Cell A1
* Cell A2
The second contains some internal formatting that in IronCalc is lost.
```xml
<?xml version="1.0" encoding="UTF-8" standalone="yes"?>
<sst xmlns="https://schemas.openxmlformats.org/spreadsheetml/2006/main" count="6" uniqueCount="2">
<si>
<t>Cell A1</t>
</si>
<si>
<r>
<rPr>
<sz val="11"/>
<color rgb="FFFF0000"/>
<rFont val="Calibri"/>
<family val="2"/>
<scheme val="minor"/>
</rPr>
<t>Cell</t>
</r>
<r>
<rPr>
<sz val="11"/>
<color theme="1"/>
<rFont val="Calibri"/>
<family val="2"/>
<scheme val="minor"/>
</rPr>
<t xml:space="preserve"> </t>
</r>
<r>
<rPr>
<b/>
<sz val="11"/>
<color theme="1"/>
<rFont val="Calibri"/>
<family val="2"/>
<scheme val="minor"/>
</rPr>
<t>A2</t>
</r>
</si>
</sst>
```
This will result in IronCalc in `shared_strings: ["Cell A1", "Cell A2"]`.
Note that the formatting we are loosing is different formatting within a cell. We can still format and style the full contents of a cell.
In this example there are two strings (`uniqueCount=2`) in the list but those strings are present in 6 cell across the workbook (`count=6`). Those parameters are not kept in IronCalc.
Another issue (a corner case) we will have in IronCalc is that we might end have repeated shared string in the list if the original Excel file has the same content is two cells with different formatting. That will mean that we end up using more memory than we need to but will not result in an error.

View File

@@ -0,0 +1,68 @@
workbook.xlm: worksheets, define names and relationships
========================================================
The most important thing we will find in `workbook.xml` is a list of sheets and a list of defined names
For example the list of sheets might be something like:
```xml
<sheets>
<sheet name="Sheet1" sheetId="1" r:id="rId1"/>
<sheet name="Chart1" sheetId="6" r:id="rId2"/>
<sheet name="Second" sheetId="3" r:id="rId3"/>
<sheet name="Sheet4" sheetId="8" r:id="rId4"/>
<sheet name="shared" sheetId="9" r:id="rId5"/>
<sheet name="Table" sheetId="7" r:id="rId6"/>
<sheet name="Sheet2" sheetId="2" r:id="rId7"/>
<sheet name="Created fourth" sheetId="4" r:id="rId8"/>
<sheet name="Hidden" sheetId="5" state="hidden" r:id="rId9"/>
</sheets>
```
The order is the order they will appear in the workbook. `sheetId` identifies the sheet and does not change if we reorder the sheets.
This example has three defined names. Those that have a `localSheetId` attribute are scoped to a sheet. Note that the `localSheetId` refers to the order in the sheet list (0-indexed) and not the `sheetId`.
A sheet can hve one of three states:
* visible
* hidden
* very hidden
To understand what file belongs to each sheet we have to do a bit of work. we will also understand the sheet "`Chart1`" is not a spreadsheet that we what to import but a "chart" sheet.
This is where the relationships file comes in (xl/_rels/workbook.xml.rels). In our case it is something like:
```xml
<Relationships xmlns="http://schemas.openxmlformats.org/package/2006/relationships">
<Relationship Id="rId8" Type="http://schemas.openxmlformats.org/officeDocument/2006/relationships/worksheet" Target="worksheets/sheet7.xml"/>
<Relationship Id="rId13" Type="http://schemas.openxmlformats.org/officeDocument/2006/relationships/sharedStrings" Target="sharedStrings.xml"/>
<Relationship Id="rId3" Type="http://schemas.openxmlformats.org/officeDocument/2006/relationships/worksheet" Target="worksheets/sheet2.xml"/>
<Relationship Id="rId7" Type="http://schemas.openxmlformats.org/officeDocument/2006/relationships/worksheet" Target="worksheets/sheet6.xml"/>
<Relationship Id="rId12" Type="http://schemas.openxmlformats.org/officeDocument/2006/relationships/styles" Target="styles.xml"/>
<Relationship Id="rId2" Type="http://schemas.openxmlformats.org/officeDocument/2006/relationships/chartsheet" Target="chartsheets/sheet1.xml"/>
<Relationship Id="rId1" Type="http://schemas.openxmlformats.org/officeDocument/2006/relationships/worksheet" Target="worksheets/sheet1.xml"/>
<Relationship Id="rId6" Type="http://schemas.openxmlformats.org/officeDocument/2006/relationships/worksheet" Target="worksheets/sheet5.xml"/>
<Relationship Id="rId11" Type="http://schemas.openxmlformats.org/officeDocument/2006/relationships/theme" Target="theme/theme1.xml"/>
<Relationship Id="rId5" Type="http://schemas.openxmlformats.org/officeDocument/2006/relationships/worksheet" Target="worksheets/sheet4.xml"/>
<Relationship Id="rId10" Type="http://schemas.openxmlformats.org/officeDocument/2006/relationships/pivotCacheDefinition" Target="pivotCache/pivotCacheDefinition1.xml"/>
<Relationship Id="rId4" Type="http://schemas.openxmlformats.org/officeDocument/2006/relationships/worksheet" Target="worksheets/sheet3.xml"/>
<Relationship Id="rId9" Type="http://schemas.openxmlformats.org/officeDocument/2006/relationships/worksheet" Target="worksheets/sheet8.xml"/>
<Relationship Id="rId14" Type="http://schemas.openxmlformats.org/officeDocument/2006/relationships/calcChain" Target="calcChain.xml"/>
</Relationships>
```
The `r:id` attribute in the sheet list links the sheet to this relationships file. For instance the sheet "shared" has an relationships id "rIdr5" that links to the file "`worksheets/sheet4.xml`" that is of type "worksheet".
Note that the second sheet "Chart" has id `rId2` that links to the file "`chartsheets/sheet1.xml`" and is of type "chartsheet". In IronCalc we ignore those sheets.
```xml
<definedNames>
<definedName name="answer" localSheetId="4">shared!$G$5</definedName>
<definedName name="answer2" localSheetId="0">Sheet1!$I$6</definedName>
<definedName name="local_thing" localSheetId="2">Second!$B$1:$B$9</definedName>
<definedName name="numbers">Sheet1!$A$16:$A$18</definedName>
<definedName name="quantum">Sheet1!$C$14</definedName>
</definedNames>
```
So `answer2` is scoped to `Sheet1` and `answer` is scoped to `shared`.

View File

@@ -0,0 +1,61 @@
Worksheets
==========
All the sheets in the workbook are in `xl/worksheets/sheet*.xlm` and represent the single most important files for us.
An example, ignoring for now the most important part `sheetData`
```xml
<worksheet xmlns="http://schemas.openxmlformats.org/spreadsheetml/2006/main" xmlns:r="http://schemas.openxmlformats.org/officeDocument/2006/relationships" xmlns:xdr="http://schemas.openxmlformats.org/drawingml/2006/spreadsheetDrawing" xmlns:x14="http://schemas.microsoft.com/office/spreadsheetml/2009/9/main" xmlns:mc="http://schemas.openxmlformats.org/markup-compatibility/2006" xmlns:x14ac="http://schemas.microsoft.com/office/spreadsheetml/2009/9/ac" xmlns:xr="http://schemas.microsoft.com/office/spreadsheetml/2014/revision" xmlns:xr2="http://schemas.microsoft.com/office/spreadsheetml/2015/revision2" xmlns:xr3="http://schemas.microsoft.com/office/spreadsheetml/2016/revision3" mc:Ignorable="x14ac xr xr2 xr3" xr:uid="{65AA7E95-0880-433A-9B1F-8563DB0FF1B5}">
<dimension ref="A1:O33"/>
<sheetViews>
<sheetView workbookViewId="0">
<selection activeCell="I6" sqref="I6"/>
</sheetView>
</sheetViews>
<sheetFormatPr defaultRowHeight="14.5" x14ac:dyDescent="0.35"/>
<cols>
<col min="5" max="5" width="38.26953125" customWidth="1"/>
<col min="6" max="6" width="9.1796875" style="1"/>
<col min="8" max="8" width="4" customWidth="1"/>
</cols>
<sheetData>
...
</sheetData>
<mergeCells count="2">
<mergeCell ref="K7:L10"/>
<mergeCell ref="H18:J20"/>
</mergeCells>
<pageMargins left="0.7" right="0.7" top="0.75" bottom="0.75" header="0.3" footer="0.3"/>
<pageSetup orientation="portrait" r:id="rId1"/>
<legacyDrawing r:id="rId2"/>
</worksheet>
```
For this file we can read the columns information, the sheet data and merged cells.
For now everything else is ignored and lost in IronCalc.
The sheetData is organized by rows:
```xml
<sheetData>
<row r="1" spans="1:2" x14ac:dyDescent="0.35">
<c r="A1" t="s">
<v>0</v>
</c>
<c r="C1">
<v>1</v>
</c>
</row>
<row r="2" spans="1:2" x14ac:dyDescent="0.35">
<c r="A2">
<v>222</v>
</c>
<c r="C2">
<v>2</v>
</c>
</row>
</sheetData>
```
In IronCalc the `spans` (an Excel optimization) is not used. The `dyDescent` property is also ignore in `IronCalc`,