David A. Wheeler's Blog

Fri, 21 Mar 2008

Microsoft Office XML (OOXML) massively defective

Robert Weir has been analyzing Microsoft’s Office XML spec (aka OOXML) to determine how defective it is, with disturbing results.

Most standards today are relatively small, build on other standards, and are developed publicly over time with lots of opportunity for correction. Not OOXML; Emca submitted Office Open XML for “Fast Track” as a massive 6,045 page specification, developed in an absurdly rushed way, behind closed doors, using a process controlled by a single vendor. It’s huge primarily because does everything in a non-standard way, instead of referring to other standards where practical as standards are supposed to do (e.g., for mathematical equations they created their own incompatible format instead of using the MathML standard). All by itself, its failure to build on other standards should have disqualified OOXML, but it was accepted for review anyway, and what happened next was predictable.

No one can seriously review such a massive document in a short time, though ISO tried; ISO’s process did find 3,522 defects. It’s not at all clear that the defects were fixed - there’s been no time to really check, because the process for reviewing the standard simply wasn’t designed to handle that many defects. But even if they were fixed - a doubtful claim - Robert Weir has asked another question, “did they find nearly all of the defects?”. The answer is: Almost all of the original defects remain. By sampling pages, he’s found error after error, none of which were found by the ISO process. The statistics from the sample are very clear: practically all serious errors have not been found. It’s true that good standards sometimes have a few errors left in them, after review, but this isn’t “just a few errors”; these clearly show that the specification is intensely defect-ridden. Less than 2% of the defects have been found, by the data we have so far, which suggests that there are over 172,000 important defects (49x3522) left to find. That’s rediculous.

Want more evidence that it’s defect-ridden? Look at Inigo Surguy’s “Technical review of OOXML”, where he examines just the WordProcessingML section’s 2300 XML examples. He wrote code to check for well-formedness and validation errors, and found that more than 10% (about 300) were in error even given this trivial test. Conclusion? “While a certain number of errors is understandable in any large specification, the sheer volume of errors indicates that the specification has not been through a rigorous technical review before becoming an Ecma standard, and therefore may not be suitable for the fast-track process to becoming an ISO standard.” This did not include the other document sections, and this is a lower bound on accuracy (XML could validate and still be in error). (He also confirmed that Word 2007 does not implement the extensibility requirements of the Ecma specification, so as a result it would be hard to “write an interoperable word processor with Word” using OOXML.)

I think that all by itself, these vast number of errors in OOXML prove that the “Fast Track” process is completely inappropriate for OOXML. The “Fast Track” process was intended to be used when there was already a widely-implemented, industry-accepted standard that had already had its major problems addressed. That’s just not the case here.

These huge error rates were predictable, too. The committee for creating OOXML wasn’t even created until OpenDocument was complete, so they had to do a massive rush job to produce anything. ( Doug Mahugh admitted that “Microsoft… had to rush this standard through.”) They didn’t reuse existing mature standards, so they ended up creating much more work for themselves. Most developers (who could have helped find and fix the defects) stayed away from the Ecma process in the first place; its rules gave one vendor complete control over what was allowed, and there was already a vendor-independent standard in place, which gave most experts no reason to participate. The Ecma process was also almost entirely closed-door (OpenDocument’s mailing lists are public, in contrast), which predictably increased the error rate too.

The GNOME Foundation has been involved in OOXML’s development, and here’s what they say in the GNOME Foundation Annual Report 2007: “The GNOME Foundation’s involvement in ECMA TC45-M (OOXML) was the main discussion point during the last meeting…. [the] Foundation does not support this file format as the main format or as a standard…” I don’t think this is as widely touted as it should be. Here’s an organization directly involved in OOXML development, and it thinks OOXML should not be a standard at all.

India has already voted “no” to OOXML. I hope others do the same. Countries with the appropriate rights have until March 29 to decide. It’s quite plausiable that the final vote will be “no”, and indeed, based on what’s published, it should be “no”. Open Malaysia reported on the March 2008 BRM meeting, for example. It reports that everybody “did their darnest to improve the spec… The final day was absolute mayhem. We had to submit decisions on over 500 items which we hadn’t [had] the time to review. All the important issues which have been worked on repeatedly happened to appear on this final day. So it was non-stop important matters… It was a failure of the Fast Track process, and Ecma for choosing it. It should have been obvious to the administrators that submitting a 6000+ page document which failed the contradiction period, the 5 month ballot vote and poor resolution dispositions, should be pulled from the process. It should have been blatantly obvious that if you force National Bodies to contribute in the BRM and end up not deliberating on over 80% of their concerns, you will make a lot of people very unhappy… judging from the reactions from the National Bodies who truly tried to contribute on a positive manner, without having their concerns heard let alone resolved, they leave the BRM with only one decision in their mind come March 29th. The Fast Tracking process is NOT suitable for ISO/IEC DIS 29500. It will fail yet again. And this time it will be final.”

In my opinion, the OOXML specification should not become an international standard, period. I think it clearly doesn’t meet the criteria for “fast track” - but more importantly, it doesn’t meet the needs for being a standard at all. It completely contradicts the goal of “One standard, one test - Accepted everywhere”, and it simply is not an open standard. I’ve blogged before that having multiple standards for office documents is a terrible idea. There’s nothing wrong with a vendor publishing their internal format; in fact, ISO’s “type 2 technical report” or “ISO agreement” are pre-existing mechanisms for documenting the format of a single vendor and product line specification. But when important data is going to be exchanged between parties, it should be exchanged using an open standard. We already have an open standard for office documents that was developed by consensus and implemented by multiple vendors: OpenDocument (ISO/IEC 26300). For more clarification about what an open standard is, or why OpenDocument is an open standard, see my essay “Is OpenDocument an Open Standard? Yes!” OpenDocument works very well; I use it often. In contrast, it seems clear that OOXML will never be a specification that everyone can fully implement. Its technical problems alone are serious, but even more importantly, the Software Freedom Law Center’s “Microsoft’s Open Specification Promise: No Assurance for GPL” makes it clear that OOXML cannot be legally implemented by anyone using any license. And this matters greatly.

Andy Updegrove calls for recognition of “Civil ICT Standards”, which I think helps puts this technical stuff into a broader and more meaningful context. He notes that in our new “interconnected world, virtually every civic, commercial, and expressive human activity will be fully or partially exercisable only via the Internet, the Web and the applications that are resident on, or interface with, them. And in the third world, the ability to accelerate one’s progress to true equality of opportunity will be mightily dependent on whether one has the financial and other means to lay hold of this great equalizer… [and thus] public policy relating to information and communications technology (ICT) will become as important, if not more, than existing policies that relate to freedom of travel (often now being replaced by virtual experiences), freedom of speech (increasingly expressed on line), freedom of access (affordable broadband or otherwise), and freedom to create (open versus closed systems, the ability to create mashups under Creative Commons licenses, and so on)… This is where standards enter the picture, because standards are where policy and technology touch at the most intimate level. Much as a constitution establishes and balances the basic rights of an individual in civil society, standards codify the points where proprietary technologies touch each other, and where the passage of information is negotiated… what will life be like in the future if Civil ICT Rights are not recognized and protected, as paper and other fixed media disappear, as information becomes available exclusively on line, and as history itself becomes hostage to technology? I would submit that a vote to adopt OOXML would be a step away from, rather than a way to advance towards, a future in which Civil ICT Rights are guaranteed”.

Ms. Geraldine Fraser-Moleketi, Minister of Public Service and Administration, South Africa, gave an interesting presentation at the Idlelo African Conference on FOSS and the Digital Commons. She said, “The adoption of open standards by governments is a critical factor in building interoperable information systems which are open, accessible, fair and which reinforce democratic culture and good governance practices. In South Africa we have a guiding document produced by my department called the Minimum Interoperability Standards for Information Systems in Government (MIOS). The MIOS prescribes the use of open standards for all areas of information interoperability, including, notably, the use of the Open Document Format (ODF) for exchange of office documents… It is unfortunate that the leading vendor of office software, which enjoys considerable dominance in the market, chose not to participate and support ODF in its products, but rather to develop its own competing document standard which is now also awaiting judgement in the ISO process. If it is successful, it is difficult to see how consumers will benefit from these two overlapping ISO standards… The proliferation of multiple standards in this space is confusing and costly.” She also said, “One cannot be in Dakar without being painfully aware of the tragic history of the slave trade… As we find ourselves today in this new era of the globalised Knowledge Economy there are lessons we can and must draw from that earlier era. That a crime against humanity of such monstrous proportions was justified by the need to uphold the property rights of slave owners and traders should certainly make us more than a little cautious about what should and should not be considered suitable for protection as property.”

You can get more detail from the Groklaw ODF-MSOOXML main page, but I think the point is clear. The world doesn’t need the confusion of a specification controlled by a single vendor being labelled as an international standard. NoOOXML has a list of reasons to reject OOXML.

path: /misc | Current Weblog | permanent link to this entry

Twisted Mind of the Security Pro

Bruce Schneier’s “Inside the Twisted Mind of the Security Professional” is highly-recommended reading - he explains the different kind of thinking required to be good at making things secure. Security pros are able to see the bigger picture, and in particular, they are able to see things from from an attacker’s perspective.

For example, “SmartWater is a liquid with a unique identifier linked to a particular owner. ‘The idea is for me to paint this stuff on my valuables as proof of ownership,’ I wrote when I first learned about the idea. ‘I think a better idea would be for me to paint it on your valuables, and then call the police.’” Similarly, on opening up an ant farm, his friend was surprised that the manufacturer would send you ants by mail; Bruce thought it was interesting that “these people will send a tube of live ants to anyone you tell them to.”

Being able to think like an attacker is so important that in my book on writing secure programs, I gave it its own heading: paranoia is a virtue. It’s still true. My thanks to Bruce Schneier for expressing this need so eloquently.

We would live in a better world if all of us could see the world as attackers do - or at least make the effort to try. In particular, we’d stop doing many foolish things in the name of “security”, and instead do things that actually secured our world.

path: /security | Current Weblog | permanent link to this entry