David A. Wheeler's Blog

Tue, 05 Jun 2007

Comparing OpenDocument (ODF) with MS-XML (OOXML/EOOXML) - and why Multiple Competing Standards are a Bad Idea

Microsoft continues to give its bizarre argument that multiple competing (conflicting) standards for the same purpose is a good thing. They are dreadfully confused. Having multiple competing products is a good thing, but having multiple competing standards is terrible.

Multiple competing standards risk massive property loss, many lives, and even the loss of your country, according to history. My presentation Open Standards and Security notes two of the many historical examples, the 1904 Baltimore fire and the U.S. Civil war:

  1. In 1904, a huge (80-block) area of Baltimore burned to the ground. Fire fighters from other cities came but couldn’t effectively connect their fire hoses to the fire hydrants, because every city had their own incompatible standard. That resulted in 2,500 burned buildings over a 30 hour period.
  2. Perhaps even more strikingly, one of the important reasons the U.S. South (the Confederacy) lost the U.S. Civil War was because the southern states had incompatible rail gauges. The U.S. North had a single rail gauge standard (for the most part), and could move troops and materiel to battles far more quickly than the U.S. South, even though the U.S. North had far greater distance to travel. Ned Harrison’s paper “’States Rights’ doomed Confederate nation” (Nov. 12, 2005) cites the problems with railroads, including the rail gauge incompatibility, as an important factor in the U.S. South’s loss (it certainly was not the only reason, but it was an important factor). Indeed, when the North conquered an area, one of the first things it did was move the rails to the Northern standard… so it could continue to exploit that advantage as it advanced.

But what about document converters, don’t they make it easy to have multiple competing standards? Well, it’s true that there are converters like ODF Converter. But while converters are very useful for one-time transitions, they are lousy long-term solutions… they make it clear that there has been a failure, not a success, in standards-making. Look at the problems multiple standards cause in other areas. We can easily convert between metric and English units, but NASA lost a $125 million Mars orbiter because a Lockheed Martin engineering team used English units of measurement while the agency’s team used the more conventional metric system. Another example is the Gimli Glider, a Boeing 767 that ran out of fuel in Canada in 1983 due, in large part, to confusion about English/metric conversion. Why do we want this problem in office documents too? Also, as of June 5, 2007, there are 6 pages of problems converting MS XML to ODF, but only 2 pages of problems converting ODF to MS XML. In general, ODF appears to be a much more capable format; just from that list it appears that it’ll be easier to improve ODF than to try to use MS XML for all documents. Which is unsurprising; the ODF work included membership from a variety of office suite implementors, while MS XML is the work of just one. That’s even more true when you consider that OpenDocument uses existing standards, instead of creating a trade barrier through nonsense one-off specifications. Microsoft claims that OOXML is necessary to “fully” capture old binary documents, but I have not seen much evidence this is actually true. A quick look at the CONVERT function in OOXML revealed many absolutely incorrect unit measures, for example. More generally, I have no reason to believe that the OOXML spec actually includes what is needed to capture the older binary formats like .doc - no mapping has been presented to prove this claim, and it’d be easy for the OOXML spec to omit lots of important information.

Regardless of the facts, Microsoft continues to press for ISO standardization of its format, aka “Microsoft XML” (MS-XML), OOXML, or EOOXML. There are now several papers about problems with OOXML. Groklaw’s EOOXML objections has lots of good information. Edward Macnaghten’s Technical Distinctions of ODF and OOXML: A Consultation Document (ODFA UKAG) is interesting because it shows what actual documents look like using the different specs - and it exposes a lot of problems in MS XML that have not been widely discussed before. Sam Hiser has an interesting ODF / OOXML comparison as well. Rob Weir has interesting comments as well.

Perhaps a more elegant demonstration the OOXML is absurd is this picture of their 6000-page spec, printed. This is a single spec; no wonder there’s been so little review compared to OpenDocument. Yet even with this hideously large size, MS XML (OOXML) still fails to give the important details that a spec really needs. The reason it’s so hideously long, while still failing to give the important details a spec needs, is simply because they ignored a vast number of standards - so they end up re-inventing lots of already-existing standards. Thus, they end up conflicting with a large number of different standards. They even ignored MathML, a widely-used standard that even they supported, and redid things from scratch.

Even on the ground this pressure for OOXML makes little sense. Magazines like Science and Nature reject Office 2007 documents. Macintoshes still can’t read the .docx (etc) format, nor can Pocket PCs.

I guess one good result is that Microsoft has encouraged voting for OpenDocument, because that’s the only logical thing it can do if it really believes that having “many conflicting formats are a good thing”. In contrast, there’s no reason that someone who wants a truly open single format needs to vote for OOXML. It’s perfectly reasonable to reject OOXML on the grounds that it conflicts with an already-existing ISO standard (OpenDocument). If there’s something that OOXML does that OpenDocument doesn’t, it would be much easier to add that tweak to OpenDocument, because OpenDocument builds on existing standards while OOXML fails to do so.

Microsoft is not a “universal evil”, and I praise them when they do good things. But encouraging multiple conflicting standards for the same area is not a good thing. In some sense, I don’t care if MS XML or ODF become “the” format for office documents, as long as the final specification is truly open. But the materials noted above lead me to believe that MS XML is not really open; it appears to be effectively controlled by one vendor, both in its current and future forms, as one obvious example. So MS XML isn’t really an option, and we already have a nice working solution.

What I want is a single document format that is fully open. What’s that mean? See Is OpenDocument an Open Standard? Yes! to see what the phrase “open standard” really means. And let’s look at it in practice. Currently I can edit text documents using the program “vim”, and I don’t even bother to ask if the other person uses emacs, or Notepad… just by saying “simple text format” we can exchange our files. Similarly, I can edit a GIF or PNG file without wondering what originally created the file - or who will edit it next. That’s generally true with other standards like HTML, HTTP, and TCP/IP. That’s the beauty of open standards - real open standards enable a thriving industry of competing products, allowing users to choose and re-choose between them. I want to see that beautiful sunlight in office suites as well.

path: /misc | Current Weblog | permanent link to this entry