High Fidelity Migration

OpenXML is designed to support all of the features in the Microsoft Office 97-2003 binary formats.

It is difficult to overstate the difficulty of accomplishing this goal, and the consequent uniqueness of OpenXML in doing so. Some formats, such as PDF, are designed to deliver a visual facsimile of a finished document to an end user. In contrast, OpenXML is intended to permit future editing or manipulation at the same level of abstraction available to the original creator; for example, reducing a vector graphic to a bitmap would fall short of this intent, as would collapsing a style hierarchy to independent styles. Further, a document can contain computational semantics that the original creator expects to preserve, such as formula logic that depends on intermediate calculation results, including error codes or animation rules that produce dynamic behavior.

These references to the Specification exemplify the ability of OpenXML to represent subtle aspects of the binary formats.

  • The SpreadsheetML description includes an extensive formula specification (§4:3.17.7).

  • The WordprocessingML specification documents the rules by which paragraph, character, numbering, and table properties are composed with direct formatting (§3:2.8, especially §3:2.8.10).

  • The PresentationML specification documents the animation features (§3:4.4).

OpenXML enables multiple implementations to conform without having to match in every inconsequential detail. This is particularly important where numerical computations are involved, such as layout, effect rendering, and formula evaluation. Requiring more consistency than is practical would create an unnecessarily high barrier for developers to achieve conformance. These statements underscore sample decisions made by the committee in this regard.

  • OpenXML defines effects such as surface appearances (§5.1.12.50) without constraining a developer to match those effects pixel for pixel.

  • OpenXML defines parameters such as page margins (§4:2.6.11), font (§4:2.8), and justification (§4:2.3.1.13). It allows developers to implement different flow algorithms as long as they respect those parameters.

  • The SpreadsheetML formula specification (§4:3.17.7) does not attempt to remove variations in floating-point computation because, in general, doing so would require conforming applications to implement slow emulation instead of relying on native hardware. Instead, it specifies the minimum number of bits of precision for numerical calculations (§4:3.17.5).

  • The SpreadsheetML formula specification also leaves certain conditional decisions implementation-defined, in order to allow for future innovation. For example, it does not limit how many times a computation such as NORMINV (§4:3.17.7.227) should iterate. (The NORMINV function performs the inverse of the normal distribution by performing an iterative search.)

A number of older features, such as VML (§3:6), are included primarily for backward compatibility. The use of newer standards already in OpenXML, such as DrawingML (§3:5), is encouraged when writing new documents.