Internationalization

OpenXML supports internationalization features required by such diverse languages as Arabic, Chinese (three variants), Hebrew, Hindi, Japanese, Korean, Russian, and Turkish.

OpenXML inherently supports Unicode because it is XML. In addition, OpenXML has a rich set of internationalization features that have been refined over the course of many years. This list is representative:

Text orientation: OpenXML supports left-to-right (LTR) and right-to-left(RTL) languages. It also supports bidirectional (“BiDi”) languages such as Arabic, Farsi, Urdu, Hebrew, and Yiddish, which run from right to left but can contain embedded segments of text that runs left to right. In WordprocessingML, text direction can be controlled on both the paragraph level (§4:2.3.1.6) and the level of a run within a paragraph (§4:2.3.2.28). Similarly, in DrawingML text, it can be controlled on the body level (§4:5.1.5.1.1), on the paragraph level (§4:5.1.5.2.2), and within numbered bullets (§4.5.1.5.4).

Text flow: In WordprocessingML, the direction of text flow can be controlled at the level of a section or a table (§4:2.3.1.41) or at the level of a paragraph (§4:2.3.2.28). At the section and table levels, text flow can be controlled in the vertical and horizontal directions. This allows OpenXML to support all potential text layouts (e.g., vertical lines flowing top to bottom and stacked left to right, to support Mongolian). This affects the layout of lists, tables, and other presentation elements. DrawingML also utilize Kumimoji settings at the paragraph and run levels to flow text horizontally and numbers vertically (§4:5.1.5.2.3, §4:5.1.5.3.9). In WordprocessingML (§4:2.3.1.16) and PresentationML (§4:4.3.1.15), character flow can also be specified using Kinsoku settings to specify which characters are allowed to begin and end a line of text.

Number representation: For field formatting in WordprocessingML (§4:2.16.4.3), paragraph/list numbering in WordprocessingML (§4:2.9), and numbering in DrawingML (§4:5.1.5.4, §4:5.1.12.61), numbers can be formatted using any of several dozen number formats, including Hiragana, Arabic, Abjad, Thai, cardinal text (e.g.,“one hundred twenty-three”),Chinese, Korean (Chosung or Ganada), Hebrew, Hindi, Japanese, Roman, or Vietnamese. These facilities also support arbitrary radix-point values (e.g.,“1.00”vs.“1,00”) and list separators. Internationalized number formatting is particularly robust in SpreadsheetML, which supports all of those features in the cell formats (§4:3.8.30) and in references to external data (§4.3.13.12).

Date representation: In WordprocessingML (§4:2.18.7) and SpreadsheetML (§4:3.18.5), calendar dates can be written using Gregorian (three variants), Hebrew, Hijri, Japanese (Emperor Era), Korean (Tangun Era), Saka, Taiwanese, and Thai formats.

Formulas: The formula specification in SpreadsheetML provides several internationalization-related conversion functions, such as BAHTTEXT (§4:3.17.7.22), JIS (§4:3.17.7.185), and ASC (§4:3.17.7.11).

Language identifiers: In WordprocessingML (§4:2.3.2.18) and DrawingML (§4:5.1.5.3), every paragraph and run can be tagged with a language identifier, allowing an application to select appropriate proofing tools and other language-specific functionality. In addition to an identifier for each language, OpenXML supports the naming of a character set, a font family and a PANOSE value to aid the application in choosing an appropriate substitute set of characters when local support is not present.