Compactness

The OpenXML file format supports the creation of high-performance applications. In this subsection, we describe some of the design points that result in a compact file, thereby speeding handling and parsing. In the next subsection, we show how modular file structure enables an application to accomplish many tasks by parsing or modifying only a small subset of a document.

An OpenXML file is conventionally stored in a ZIP archive for purposes of packaging and compression, following the recommended implementation of the Open Packaging Conventions. Perhaps surprisingly, OpenXML files are on average 25% smaller, and at times up to 75% smaller, than their binary counterparts. For example, this white paper is 85% larger in the binary format!

A second simple source of compactness, particularly where an uncompressed representation is required, is the length of identifiers in the XML. Frequently used tag names are short. Implementers are encouraged to use short namespace prefixes as well; for example, the conventional prefix for the WordprocessingML namespace is“w”.

Further compactness is achieved by avoiding repetition throughout the file format. One class of examples removes redundant storage of large objects.

  • In SpreadsheetML, repeated strings are stored in a string table in the workbook, and referenced by index (§3:3.3).

  • In SpreadsheetML,a formula that is filed down or across several cells is stored as a single “master” formula in the top left cell; the other cells in the fill range refer to it by a grouping index (§3:3.2.9.2).

  • In DrawingML, shape names (§4:5.1.12.56), text geometries (§4:5.1.12.76), and other presets (several throughout §3:5.8, §3:5.9, and §4:5.1.12) are represented by name or number instead of explicitly. In these cases, the meanings of names and numbers reside in the Specification and not in the file. Here, the chosen representation is the result of an explicit tradeoff decision during the standards process. It is compact and allows editing at the correct level of abstraction: for example, a rectangle could be changed to an oval by changing one attribute (§4:5.1.11.18).

In another class of examples, hierarchy is used to provide inheritance semantics. As a happy by-product, this increases performance by reducing file sizes.

  • In WordprocessingML, styles are hierarchical (§3:2.8.9).

  • In DrawingML, shapes are grouped hierarchically (§4:5.1.2.1.20).

  • In PresentationML, a default hierarchy relates slide masters, slide layouts, and slides (§3:4.2).

Other aspects of OpenXML are also designed to enable efficient implementation. For instance, in SpreadsheetML, the cell table stores only non-empty cells and is capable of representing merged cells as a unit. The economy afforded by this technique is significant for sparse spreadsheets.