Backwards Compatibility in Office Open XML
As a member of my country's national standards body committee on electronic data processing, I lately spend considerable time deliberating what our position should be in the upcoming Office Open XML ISO Ballot Resolution Meeting in Geneva. My biggest objection concerns large parts of the standard that are proposed to live in an Annex containing normative descriptions of deprecated features that will only be used by existing binary documents. The rationale behind this decision is backwards compatibility. My opinion is that this solution is counterproductive for a number of reasons.
Burden on Third-Party Applications
The current disposition of comments for Greece proposes various parts to be moved to a normative Annex. Here is a partial list of the corresponding Part 4 paragraphs.
- 2.15.1.28, pages 1,158–1,172 (Hash algorithm)
- 6, pages 4,343–4,960 (VML)
- 2.15.3.26, pages 1,416–1,417 (Word 95 footnotes)
- 2.15.3.31, page 1,426–1,427 (line wrap like Word 6)
- 2.15.5.32, pages 1,427–1,428 (small caps Word for Mac)
- 2.15.3.41, page 1,422–1,423 (shapeLayoutLikeWW8)
- 2.15.3.51, pages 1,462–1,463 (suppressTopSpacingWP)
- 2.15.3.53, pages 1,467–1,468 (truncateFontHeightsLikeWP6)
- 2.15.3.6, pages 1,378–1,379 (autoSpaceLikeWord95)
- 2.15.3.63, page 1,481 (useWord2002TableStyleRules)
- 2.15.3.64, pages 1,482–1,483 (useWord97LineBreakRules)
- 2.15.3.65, pages 1,483–1,484 (wpJustification)
- 2.15.3.66, page 1,485 (wpSpaceWidth)
- 2.16.5.5, page 1512 (AUTONUM)
- 2.15.1.28, pages 1,158–1,172 (document protection)
Backwards Compatibility is Not Preserved
Microsoft claims that a new standard and its huge normative Annex is required
for backwards compatibility with legacy formats.
Let's see how well this backwards compatibility works.
I opened a new Word 2000 (SP3) document and wrote in it the words
"hello, world".
I then tried to save it using Microsoft's own Word 2007 document
conversion support.
This is the message I got
Now, if Microsoft's software can't faithfully convert a simple two-word document
into XML, what are the chances of handling more complicated stuff?
Therefore, let's drop the backward compatibility excuse.
The Proposed Solution is a Sham
Let us be honest about it. Backward compatibility with legacy documents can be preserved without burdening the standard with hundreds of pages of descriptions of non-standard formats. Current .docx documents are a zip bundle, like the following, containing various XML files.
Length Size Ratio Date Time Name -------- ------- ----- ---- ---- ---- 1312 358 73% 01-01-80 00:00 [Content_Types].xml 590 243 59% 01-01-80 00:00 _rels/.rels 817 250 69% 01-01-80 00:00 word/_rels/document.xml.rels 1035 463 55% 01-01-80 00:00 word/document.xml 6992 1686 76% 01-01-80 00:00 word/theme/theme1.xml 2172 1015 53% 01-01-80 00:00 word/settings.xml 1031 382 63% 01-01-80 00:00 word/fontTable.xml 260 187 28% 01-01-80 00:00 word/webSettings.xml 725 386 47% 01-01-80 00:00 docProps/app.xml 775 385 50% 01-01-80 00:00 docProps/core.xml 14818 1788 88% 01-01-80 00:00 word/styles.xml -------- ------- --- ------- 30527 7143 77% 11 filesThe only thing that is needed in order to faithfully preserve a legacy document is to include in the bundle the document in its binary format. An application can then choose to open the document in its legacy form, or in its current form. This can even be done in XML. The following XML document contains a gzipped-base64-encoded version of a Microsoft Word 2000 document.
<?xml version="1.0" encoding="UTF-8" standalone="yes"?>
<w:legacy_document xmlns:ve="http://schemas.openxmlformats.org/legacy">
H4sICAojvUcAA2hlbGxvLmRvYwDtXF1sVEUU/ububttdit1WLBW0LD+iic2GhmiAGNPaglUDNLZI
Y0i0pYtd3O2t2yUNSEjxlxgTa3jQGBPhwQQ00arESGIi+sSLkReMxBc0mvhgTEEfhATWc2bmtre3
f7tEIMD5krlnfs+Zv3v3fnNn9tQP1WcPf7boFwTwMEK4XIiizBenyC31AnFguY27XCgUOCpBriC4
[...]
i95+P4DNNAt2FlN8EmrIPj/D+JlVSv97lozVFPVAHu36XsiUZL+WWjBX+71578mSDBSBUvvfD+//
ewS3JhSNfihm5lDw2c33aWDvWqu7fVc21Z/X7wQbOziOovTNzP6kl55cg3/Wfv7C/zXDBVcL/wEX
ulKCAEwAAA==
</w:legacy_document>
Proper Solution
So what is the proper solution to this problem? I am not convinced that Microsoft couldn't work within the existing ISO/IEC 26300:2006 (ODF) standard to cover the needs of its applications. Nevertheless, assuming that there is indeed a need for a second standard for office applications, the solution for backwards compatibility would be to translate legacy formats to conform to the main part of the proposed standard. If there are formatting styles that cannot be accommodated, then the standard should be amended to support those styles. If Microsoft can't write reliable code to transform legacy formats to the new format, then this is a problem of Microsoft, and not a problem that should be passed to all implementers and users by including in a new standard support for legacy formats. VML and Open XML Math should go.
Read and post comments