David A. Wheeler's Blog

Tue, 09 Dec 2008

The Center for Strategic and International Studies (CSIS) has just released an interesting new report titled “Securing Cyberspace for the 44th Presidency: A Report of the CSIS Commission on Cybersecurity for the 44th Presidency”. The project was co-chaired by Representative James R. Langevin, Representative Michael T. McCaul, Scott Charney, and Lt. General Harry Raduege, USAF (Ret). If you’re interested in getting our computer infrastructure more secure, I think this is worth looking at.

The three major findings were: (1) cybersecurity is now a major national security problem for the United States, (2) decisions and actions must respect privacy and civil liberties, and (3) only a comprehensive national secuirty strategy that embraces both the domestic and international aspects of cybersecurity will make us more secure.

Among their recommendations, they suggest “Regulate cyberspace. Voluntary action is not enough. The U.S. must … set minimum standards in order to ensure that the delivery of critical services in cyberspace continues if the U.S. is attacked… [avoid] prescriptive mandates [and] overreliance on market forces, which are ill-equipped to meet national security and public safety requirements”. I agree that market forces, without any help, aren’t well-equipped to deliver security, but the challenge is in the details… it’s difficult to strike that balance well.

They recommend conducting research and development for cybersecurity - I’m glad they do, that’s vitally important. (I just saw the video The Science of Victory, which briefly discusses the importance of research to U.S. national defense.) CSIS also recommends not starting over - instead, they recommend building on and refining the existing “Comprehensive National Cybersecurity Initiative”.

In any case, computers and computer networks are no longer interesting toys, they are vital services. We need to improve how we protect them.

path: /security | Current Weblog | permanent link to this entry

Mon, 08 Dec 2008

Use “FLOSS” instead of “FOSS” or “OSS/FS” as Universal term for Open Source Software / Free Software

Below is my brief attempt to untangle some terminology. In short, I suggest using “FLOSS” instead of “FOSS” or “OSS/FS” for software which meets the Free Software Definition and Open Source Definition.

There are many alternative terms for “Free software” in the sense of the Free Software Definition. Examples of such alternatives are “open source software”, “libre software”, “FOSS” or “F/OSS” (free/open-source software”), OSS/FS (open source software / free software), “freed software”, and “unfettered software”. Wikipedia’s article on alternative terms for free software discusses alternative terminology further.

For someone (like me) who tries to write about software under these kinds of licenses, having multiple different names is annoying. What’s worse, the term “Free software” (the original term) is really misleading; people who hear that term presume that you mean “no cost”, which is not related to the intended meaning of “freedom”. Yes, I know that the “Free” means “freedom” (aka “Free as in speech” or “Free market”), and I’m well aware that you can charge and pay for Free software. But you have to re-teach everyone who knows English, and you’re fighting a losing battle against search engines (which will mix results together when you search for a phrase with two common meanings). Even the FSF admits that the term “Free software” is widely misunderstood. Years ago I suggested to Richard Stallman that he use the term “freed software” instead of “free software”, so that the term would be different but the acronyms could stay the same; obviously he didn’t accept that suggestion.

The term “open source software” is the most widely used term in English, and the term’s creators intentionally tried to include everyone regardless of their motivations. I’m happy to use the term “open source software”; I think it’s a reasonable term and it’s widely accepted. So, in groups which already use that term, I’ll gladly use “open source software” as the “universal” term that covers all such software, regardless of the motivations of the developers.

Unfortunately, many of the developers of such software strongly object to the term “open source software” as the universal term. Their objection is that many people who use the term “open source software” only emphasize engineering or business advantages, while the FSF emphasizes freedom for end-users and objects to a term that doesn’t note that possibility somehow.

This objection causes problems for people like me. I’m usually not trying to exclude those who object to the term “open source software”. Instead, I often want an inclusive term to describe such software, regardless of the motivations of its developers. Different developers often have different motivations - and even the same developer may have different motivations over time or on different projects. So, is there some term that most can accept?

One inclusive term is “OSS/FS”, which you can blame me for. I starting writing about open source software / free software (OSS/FS) many years ago, when such writings were much less common. So for example, look at the title of my massive paper “Why open source software / free software (OSS/FS)? Look at the Numbers!” At the time, there wasn’t an obvious “universal” term, so I chose to use “OSS/FS”, which was an obvious way to combine the two most common terms. “OSS/FS” takes too long to pronounce, though, so it hasn’t really caught on.

Among the other terms, “FLOSS” (Free/Libre/Open-Source Software) seems to have won the popularity contest as a “universal” English term that nearly all can accept. Google reports the approximate number of pages a phrase will return, so it’s a reasonable way to determine how popular a phrase is. A quick search on Google (using English on 2008-12-08) shows these popularity figures: “FLOSS software” gets 1,570,000 hits, “Libre software” gets 596,000 hits, “FOSS software” gets 595,000 hits, “OSS/FS” gets 193,000 hits, and “F/OSS software” gets 66,200 hits. Note that FLOSS adds the term “libre”; “libre software” or “livre software” is widely used as the universal term in Romance languages, so adding it helps clarify which sense of “free” is meant. The term “FLOSS” also hides the fact that some people prefer the original spelling “open source software”, while others prefer to hyphenate it as “open-source software”. In context, “FLOSS” is unlikely to be confused with dental floss. If you’re an advocate who objects to that similarity, just imagine that “FLOSS cleans the gunk out” :-).

So, when speaking to a group that already uses “open source software” as the universal term for such software, I’ll use “open source software”. I have no objection to “open source software” as the universal term; it’s a reasonable term, and my primary goal is understanding by my hearers. I don’t see a problem with using separate terms to describe people’s motivations, as opposed to terms for the software that they (co-)develop. Thus, I can glibly say “open source software is co-developed by people and organizations who often have different motivations to do so; those who develop it to promote freedom for end-users typically term their members the ‘Free Software Movement’, and those who develop it for engineering and business (cost-saving) reasons may be referred to as the ‘Open Source Movement’”. I think that’s perfectly acceptable as terminology; it’s certainly clear that different people and organizations can have different motivations and yet can work together. Many, many people use “open source software” as the universal term for such software.

But not all groups accept the term “open source software” as the universal term for such software, and my writings on my website cater to a variety of people. You can’t please everyone, but I’d like to avoid unnecessarily alienating people. At the least, I’d rather people object to the substance of my writing instead of my word choice :-).

So: I suggest using “FLOSS” (Free/Libre/Open-source software) instead of “FOSS” or “F/OSS” or “OSS/FS” as the universal term for such software (in English). It’s easy to say, inclusive, and it seems to be the most popular of those alternatives. Many people use “open source software” or “Free software” (with the funny capitalization) as the universal term for such software, and that’s fine by me. But for material on my website, I intend to slowly migrate towards FLOSS instead of my older term OSS/FS. I’ll leave OSS/FS in a few places (including titles) so that people searching for it will still find it, but at this point, I think the OSS/FS acronym is a fossil. I’ll tend to leave “open source software” where I use that term; typically those are written for audiences where it’s established practice to use that term.

Strictly speaking, “Free software” in the FSF sense is defined by the Free Software Definition (FSD), and “open source software” is defined by the Open Source Definition (OSD). In practice, a software license will typically either meet, or fail to meet, both definitions. This is hardly surprising; different dictionaries have different descriptions of much simpler words. The Free Software Definition is much simpler and easier to understand, and it gives a better understanding of why an end-user might want such a license. So when describing what FLOSS is, I generally prefer to use the simpler and clearer Free Software Definition. If I’m doing a technical analysis of a software license, I require that both definitions be met. The FSD is better at explaining the overall concept, but the OSD has additional information that helps clarify it. In other words, a software license must meet both definitions, or it’s not a FLOSS license.

path: /oss | Current Weblog | permanent link to this entry

Mon, 01 Dec 2008

Tell the FTC: Eliminate software patents

The U.S. Federal Trade Commission (FTC) has announced the first of a possible series of public hearings to “explore the evolving market for intellectual property (IP)”. They’ll begin December 5, 2008, in Washington, DC.

Please let the FTC know that the U.S. should eliminate software patents! The U.S. courts made the mistake of rewriting the laws to permit software patents and business method patents, resulting in a stifling of both competition and innovation. Certainly, software patents have been harming free/libre/open-source software (FLOSS), but they’ve been harmful to proprietary software development too. The U.S. government is asking for comments - so please let them know about the problems software patents cause, so that we can get rid of them!

To help, I’ve posted a web page titled Eliminate Software Patents!. This page points to many existing papers and organizations that explain why software patents should be eliminated. And it also lists some ways that, if they can’t eliminate the mistake, to at least reduce the damage of the mistake.

I think such comments would be very much in line with their series. They state that “The patent system has experienced significant change since the FTC released its first IP Report in October 2003, and more changes are under consideration… [changes include] decisions on injunctive relief, patentability, and licensing issues… [and] there is new learning regarding the operation of the patent system and its contribution to innovation and competition.” Even the FTC’s original 2003 report noted how many people were opposed to software patents.

Please contact the FTC, and let them know that software patents should be eliminated… and feel free to use any of these resources if you do contact them.

path: /oss | Current Weblog | permanent link to this entry

Tue, 04 Nov 2008

Kudos and Kritiques: ‘Automated Code Review Tools for Security’ - and OSS

The article “Automated Code Review Tools for Security” (by Gary McGraw) has just been released to the web. Officially, it will be published in IEEE Computer’s December 2008 edition (though increasingly this kind of reference feels like an anachronism!). The article is basically a brief introduction to automated code review. Here are a few kudos and kritiques (sic) of it, including a long discussion about the meaning of “open source software” (OSS) that I think is important to add.

First, a few kudos. The most important thing about this article is that it exists at all. I believe that software developers need to increasingly use static analysis tools to find security vulnerabilities, and articles like this are needed to get the word out. Yes, the current automated review tools for security have all sorts of problems. These tools typically have many false positives (reports that aren’t really vulnerabilities), they have many false negatives (they fail to report vulnerabilities they should), they are sometimes difficult to use, and their results are sometimes difficult to understand. But static analysis tools are still necessary. Software is getting larger, not smaller, because people keep increasing their expectations for software functionality. Security is becoming more important to software, not less, as more critical functions depend on the software and most attackers focus on them. Manual review is often too costly to apply (especially on pre-existing software and in proprietary development), and even when done it can miss ‘obvious’ problems. So in spite of current static analysis tools’ problems, the rising size and development speed of software will force many developers to use static analysis tools. There isn’t much of a practical alternative - so let’s face that! Some organizations (such as Safecode.org) don’t adequately emphasize this need for static analysis tools, so I’m glad to see the need for static analysis tools is being emphasized here and elsewhere.

I’m especially glad for some of the points he makes. He notes that “security is not yet a standard part of the security curriculum”, and that “most programming languages were not designed with security in mind.. [leading to] common and often exploited vulnerabilities.” Absolutely true, and I’m glad he notes it.

There are a number of issues and details the article doesn’t cover. For example, he mentions that “major vendors in this space include Coverity, Fortify, and Ounce Labs”; I would have also included Klocwork (for Insight), and I would have also noted at least the open source software program splint. (Proprietary tools can analyze and improve OSS programs, by the way; Coverity’s ‘open source quality’, and Fortify’s ‘Java open review’ projects specifically work to review many OSS programs.) But since this is a simple introductory article, he has to omit much, so such omissions are understandable. I believe the article’s main point was to explain briefly what static analysis tools were, to encourage people to look into them; from that vantage point, it does the job. If you already understand these tools, you already know what’s in the article; this is an article for people who are not familiar with them.

Now, a few kritiques. (The standard spelling is “critiques”, but I can’t resist using a funny spelling to match “kudos”.) First, the article conflates “static analysis” with “source code analysis”, a problem also noted on Lambda the ultimate. The text says “static analysis tools - also called source code analyzers” - but this is not true. There are two kinds of static analysis tools: (1) Source analysis tools, and (2) binary/bytecode analysis tools. There are already several static analysis tools that work on binary or bytecode, and I expect to see more in the future. Later in the text he notes that binary analysis is possible “theoretically”, but it’s not theoretical - people do that, right now. Yes, it’s true that source code analysis is more mature/common, but don’t ignore binary analysis. Binary analysis can be useful; sometimes you don’t have the source code, and even when you do, it can be very useful to directly analyze the binary (because it’s the binary, not the source, that is actually run). It’s unfortunate that this key distinction was muddied. So, be aware that “static analysis tools” covers the analysis of both source and binary/byte code - and there are advantages of analyzing each.

Second, McGraw says that this is a “relatively young” discipline. That’s sort-of correct in a broad sense, but it’s more complicated than that, and it’s too bad that was glossed over. The basic principles of secure software development are actually quite old; one key paper was Saltzer and Schroeder’s 1975 paper which identified key design principles for security. Unfortunately, while security experts often knew how to develop secure software, they tended to not write that information in a form that ordinary application software developers could use. To the best of my knowledge, my book Secure Programming for Linux and Unix HOWTO (1999-2003) was the first book for software developers (not attackers or security specialists) on how to develop application software that can resist attack. And unfortunately, this information is still not taught in high school (when many software developers learn how to write software), nor is it taught in most undergraduate schools.

Third, I would like to correct an error in the article: ITS4 has never been released as open source, even though the article claims it was. The article claims that “When we released ITS4 as an open source tool, our hope was that the world would participate in helping to gather and improve the rule set. Although more than 15,000 people downloaded ITS4 in its first year, we never received even one rule to add to its knowledge base”. Is it really true, that open source software doesn’t get significant help? Well, we need to know what “open source” means.

What is “open source software”? It’s easy to show that the Open Source Initiative (OSI)’s Open Source Definition, or the Free Software Foundation’s simpler Free Software Definition, are the usual meanings for the term “open source software”. Just search for “open source software” on Google - after all, a Google search will rank how many people point to the site with that term, and how trusted they are. The top ten sites for this term include SourceForge, the Open Source Initiative (including their definition), the Open Source Software Institute, and (surprise!) me (due to my paper Why Open Source Software / Free Software (OSS/FS, FLOSS, or FOSS)? Look at the Numbers!). The top sites tend to agree on the definition of “open source software”, e.g., SourceForge specifically requires that licenses meet the OSI’s open source definition. I tend to use the Free Software Definition, because it’s simpler, but I would also argue that any open source software license would need to meet both definitions to be generally acceptable. Other major sites that are universally acknowledged as supporting open source software all agree on this basic definition. For example, the Debian Free Software Guidelines were the basis of the Open Source Definition, and Fedora’s licensing and licensing guidelines reference both OSI and FSF (and make clear that open source software must be licensed in a way that lets anyone use them). Google Code accepts only a limited number of licenses, all of which meet these criteria. The U.S. Department of Defense’s 2003 policy latter, “Open Source Software in the DoD” essentially uses the Free Software Definition to define the term; it says open source software “provides everyone the rights to use, modify, and redistribute the source code of the software”. In short, the phrase “open source software” has a widely-accepted and specific meaning.

So let’s look at ITS4’s license. As of November 3, 2008, the ITS4 download site’s license hasn’t changed from its original Feb 17, 2000 license. It turns out that the ITS4 license clearly forbids many commercial uses. In fact, their license is clearly labelled a “NON-COMMERCIAL LICENSE”, and it states that Cigital has “exclusive licensing rights for the technology for commercial purposes.” That’s a tip-off that there is a problem; as I’ve explained elsewhere, open source software is commercial software. License section 1(b) says “You may not use the program for commercial purposes under some circumstances. Primarily, the program must not be sold commercially as a separate product, as part of a bigger product or project, or used in third party work-for-hire situations… Companies are permitted to use this program as long as it is not used for revenue-generating purposes….”. In section 2, “(a) Distribution of the Program or any work based on the Program by a commercial organization to any third party is prohibited if any payment is made in connection with such distribution, whether directly… or indirectly…”. Cigital has a legal right to release software in just about any way it wants to; that’s not what I’m trying to point out. What I’m trying to make clear is that there are significant limitations on use and distribution of ITS4, due to its license.

This means that ITS4 is clearly not open source software. The Open Source Definition requires that an open source software license have (#5) No Discrimination Against Persons or Groups, and (#6) No Discrimination Against Fields of Endeavor. The detailed text for #6 even says: “The license must not restrict anyone from making use of the program in a specific field of endeavor. For example, it may not restrict the program from being used in a business, or from being used for genetic research.” Similarly, the Free Software Definition requires the “freedom to run the program, for any purpose” (freedom 0). ITS4 might be described as a “source available”, “shared source”, or “open box” program - but it’s not open source. Real open source software licenses permit arbitrary commercial use; they are designed to include commercial users, not exclude them.

It’s absurd to complain that “no one helped this open source project” when the project was never an open source project. Indeed, even if you release software with an OSS license, there is no guarantee that you’ll get support (through collaboration). No one is obligated to collaborate with you if you release OSS - you have to convince others that they should collaborate. And conversely, you have every legal right to not release under an OSS license. But OSS licenses are typically designed to encourage collaboration; having a license that is not an OSS license unsurprisingly discourages such collaboration. As I point out in my essay “Make Your Open Source Software GPL-Compatible. Or Else”, if you want OSS collaboration you really need to pick from one of the few standard GPL-compatible licenses, for reasons I explain further in that essay.

This matters. My tool flawfinder does a similar task to ITS4, but it is open source software (released under the world’s most popular OSS license, the GPL). And I did get a lot of collaborative help in developing flawfinder, as you can tell from the Flawfinder ChangeLog. Thus, it is possible to have OSS projects that analyze programs and receive many contributions. Here’s a partial list of flawfinder contributors: Jon Nelson, Marius Tomaschewski, Dave Aitel, Adam Lazur, Agustin.Lopez, Andrew Dalgleish, Joerg Beyer, Jose Pedro Oliveira, Jukka A. Ukkonen, Scott Renfro, Sascha Nitsch, Sebastien Tandel, Steve Kemp (lead of the Debian Security Auditing Project), Jared Robinson, Stefan Kost, and Mike Ruscher. That’s at least 17 people co-developing the software! Not all of these contributors added to the vulnerability database, but I know that at least Dave Aitel and Jaren Robinson added new rules, Stefan Kost suggested specific new rules (though I don’t think he wrote the code for it), and that Agustin Lopez and Christian Biere caused changes in the vulnerability database’s reporting information. There may have been more; in a collaborative process, it’s sometimes difficult to fully give credit to everyone who deserves it, and I don’t have time to go through all of the records to determine the minutia. It’d be a mistake to think that only database improvements matter, anyway; other user-contributed improvements were useful! These include changes that enabled analysis of patch files (so you can limit reporting to the lines that have changed), made the reports clearer, packaged the software (for easy installation), and fixed various bugs.

Obviously, releasing under a true OSS license helped immensely in getting contributions - especially if ITS4 is our point of comparison. For example, since flawfinder is an OSS program, it was easily incorporated into a variety of OSS Linux distributions, making it easier to use. I also suspect that some of the people most interested in using this kind of program were people paid to evaluate programs - but many of these uses were forbidden by the default ITS4 license.

In short, a non-OSS program didn’t have much collaborative help, while a similar OSS program got lots of collaborative help… and that is an important lesson. There is a difference between “really being open source software” and “sort of openish but not really”. If you look at ITS4 version 1.1.1, its CHANGES file does list a few external contributions, but they are primarily trivial tweaks or portability changes (to get it to work at all). McGraw himself admits that ITS4 didn’t get any new new rules in its first year. In contrast, OSS flawfinder added user-created rules within 6 months of its initial release, and over time the OSS program had lots of functionality provided by other co-developers. I don’t want McGraw’s incorrect comment about “open source” go unchallenged, when there’s an important lesson to be learned instead.

That said, my thanks to McGraw for noting flawfinder (and RATS), as well as ITS4. Indeed, my thanks to John Viega, McGraw, and others for developing ITS4 and the corresponding ACSAC paper - I think ITS4 was an important step in getting people to use and develop static analysis tools for security. I also agree with McGraw that deeper analysis of programs is the way of the future. Tools that focus on simple lexical analysis (like ITS4, flawfinder, and RATS) have their uses, but there is much that they cannot do, which is why tool developers are now focusing on deeper analysis approaches. Static analysis tools that examine a program in more detail (like splint) can do much that simple tools (based on lexical analysis) cannot.

Most importantly, McGraw and I agree on the conclusion: “Static analysis for security should be applied regularly as part of any modern software development process.” We need to get that word out to where software is really developed. Static analysis not the only thing that people need to do, but it’s an important part. Yes, static analysis tools aren’t perfect. But static analysis tools can significantly help develop software that has real security.

path: /oss | Current Weblog | permanent link to this entry

Wed, 29 Oct 2008

Internet Wishlist

It’s election season in the United States, a fact that’s rather hard to miss in Northern Virginia (where I live). Popular Science is running a letter by Daniel Engber (of Slate Magazine) in which he offers the US Presidential nominees advice on using the full potential of the Internet upon their election into office. This letter is being discussed in Slashdot. Terry Sweeney believes that issues related to the Internet Won’t Matter in this election, and unfortunately, I think he’s right. But still, we can hope, can’t we?

In any case, election season is a good excuse to think of helpful things that the U.S. government could do relating to the Internet and related IT technology. Engber’s letter certainly got me thinking that direction. I think it’s useful to try to think of such things, because by examining and discussing them, some of them might come to pass. So in that spirit, here’s my candidate list:

Make spam illegal. Make sending unsolicited bulk email (spam) illegal, and in particular, require that people OPT-IN to receive messages sent in bulk. The current ‘opt-out’ system in the U.S. is silly, and always was. As essentially all information about spam notes, “Never Reply To Spam”. “Don’t [reply] to the spam message or [try] to send email to an email address given in the body of the spam and asking to be removed from the mailing list… spammers are much too sophisticated now for replies to affect them at all. And the From: addresses in spam messages are usually faked anyway.” Responding “just identifies you as a real person who read their message”. Europeans have the more sensible opt-in system. Laws do make a differenace; far more spam is U.S. than European in origin, due to the U.S.’s lax laws. It’s not that spam hard to define; if more than 1000 people (say) receive it, and they didn’t sign up for it (e.g., by signing up for a mailing list), it’s spam. A law will not solve everything, but it would help; technical measures can only go so far, and need laws to help make them work. The U.S. currently protects fax machines from spam, and that has worked! The current CAN-SPAM law legalizes spam - and thus is a sick joke. It’s time to make it illegal, to protect all of our inboxes.
Require public access (free via web) to federally-funded research. Put all federally-funded unclassified research papers on the web, with no fees or sign-ins, so that a Google search can find it. NIH is already doing this; see the NIH public access policy. NIH isn’t perfect; their “12 month” period is silly (the web publication should occur immediately). Still, it’s an improvement, and it’s absurd that this is limited to NIH; federally-funded research should be published government-wide, no matter what arm it came from. Why should the public pay for research, then pay again to read it? Just imagine how much faster research could go if anyone could quickly click and review the latest research. Just imagine how much better the public could be informed if they could easily read U.S. research on a topic… instead of only having the flim-flam artists. I think I could make a good case that in academic research, the word “published” is increasingly meaning “accessible via Google”; anything Google can’t find doesn’t exist to many people. It’s shameful how certain publishers effectively steal U.S. research for private gain through monopolistic publishing contracts - they do not pay for the research, and typically they don’t even pay the researchers or reviewers! If you want exclusive rights to publish research, then you should pay all the costs of performing the research. I can see a case where the publisher footed 50% of the research bill (not just the paper-writing costs) and got a one-year publication delay, but the “owning” of research papers is indefensible. If you accept government money - and the government is of the people, by the people, and for the people - then the people should be receiving the research results. Let’s get rid of the unnecessary intermediaries and “poll taxes” on U.S. funded research.
Federally-developed unclassified software: Open source software by default. By default, if the government funds unclassified software development (e.g., via research), that software should be released as open source software (under some common license). That way, anyone can use it, modify, and redistribute it (in modified or unmodified form). Again, why should the public pay for software, then pay again to use it? Currently, if researcher B wants to continue work of researcher A, both of which were paid via government funds, researcher B typically has to re-implement what researcher A did - and that can stop the research before it begins. This even applies to the government itself; often the government pays for re-development of the same software, because there’s no public information on software the government has already paid to develop. If the funds are mixed, try to break it down into pieces; if that won’t work, release the mixed-funding software after some fixed time (the U.S. DoD has a 5-year clock, starting at contract signing, for when the DoD could release some mixed-funding software as open source). If you are starting a proprietary software company, and want exclusive rights to developed software, then go to the bank or a venture capitalist (VC). The government is not a VC, so don’t expect it to be one. Exceptions will be needed… but they should be exceptions, not the rule.
Increase funding on computer security. Some is done now, of course, but it pales compared to the problem. I guess this could be construed as being self-serving; after all, I try to improve computer security as a living. But the reason I do it is because I believe in it. There are many tools that enhance our muscles (cars, jackhammers, etc.), but essentially only one tool that enhances our mind: Computers. Which is one reason why computers are everywhere. Yet their very ubiquity is a problem, because they were generally not designed to be secure against determined attackers. I believe governments should not try to do all things; there are a lot of things government just isn’t good at. But defense is an area that is hard to do on an individual or business-by-business basis, yet we need it collectively - and it’s those kinds of problems that governments can help with.
Increase formal methods research. The world is globalizing, and we increasingly depend on software. Testing is not a good way to make (or verify) high quality software; you can’t even fully test the trivial program “add 3 64-bit numbers” in less time than the age of the universe. In the long run, if we want really high levels of quality for software, we need better approaches, and there’s one obvious one: Formal methods. Formal methods apply mathematical approaches to software development. There are a lot of reasons people don’t use them today in typical software development projects, though. We need research to help turn those reasons into the past tense for most projects.
Drop the DMCA’s anti-circumvention measures. The anti-circumvention stuff is just nonsense; they don’t fight piracy, but they do try to inhibit legal activities - and thus encourage lawlessness. XKCD’s “Steal this comic” shows the nonsense that Digital Restrictions Management (DRM) schemes bring, ones that the DMCA is absurdly trying to prop up. As far as I can tell, people are still making music and movies, even though the DRM schemes (and the anti-circumvention measures that prop them up) are a failure. Anti-circumvention measures make obviously lawful uses illegal (e.g., viewing DVDs on a Linux machine or putting your DVDs on your hard drive) - encouraging everyone to break the law.
Drop software patents. Software patents have been a massive unjustified government intervention in the market. There is still no evidence that they are an improvement, and a lot of evidence that they are causing serious market failures. Save massive amounts of government money by getting rid of the whole useless bureaucracy.
Fix copyright laws so that they make sense to normal people. I believe that the current copyright laws were written under the assumption that only large publishers, with reams of lawyers, needed to understand them. Now 9-year-olds need to understand them… except that they’re completely nonsensical. “Normal” people expect that short extractions aren’t copyright infringements, yet current U.S. law and court cases endorse such nonsensical interpretations (e.g., Bridgeport Music Inc. v. Dimension Films, 410 F.3d 792 (6th Cir. 2005) seems to say that even 3 notes can be an infringment). Strictly speaking, many Youtube videos break the law, even when a normal person would expect that the use would be okay. The term lengths of copyright far exceed the minimum necessary to obtain such works (which should be the criteria), and “fair use” needs to be clearer and more expansive. The penalties are also absurd; I disapprove of illegal copying, but the current penalties ($750 for a $1 song??) are so disproportionate that they probably violate the U.S. Constitution’s 8th amendment (“Excessive bail shall not be required, nor excessive fines imposed, nor cruel and unusual punishments inflicted.”). I believe that copyright law is in principle a good idea, but it sure isn’t working in practice like it’s supposed to. See Tales from the Public Domain: Bound by Law for an interesting perspective on this. For a specific example, I think that anything not marked by its author as copyrighted should be in the public domain; currently every jot and tiddle on the Internet is “copyrighted” by someone, making it nigh-impossible to keep track of all the claims over rights. It used to be that way - there’s no reason it couldn’t be again. A much shorter copyright term would be helpful, too - something within people’s lifetimes. In the past, publishers got disproportionate control over the process of modifying the copyright laws. We need to fix these laws so that they balance the needs of creators, publishers/distributors, and recipients. They need to be very simple, clear, and fair, because with the Internet, 9-year-olds can and do become publishers.

So, there’s my Christmas list. Some of them don’t even cost money; they simply remove bad laws, and actually save money. This is my personal list, not influenced by my employer, my pets, and so on. Perhaps this list (and others like it) will start the ball rolling.

path: /security | Current Weblog | permanent link to this entry

Thu, 23 Oct 2008

Solved: Why is ESC so big?

In my post Estimating the Total Development Cost of a Linux Distribution, I noted that one of Fedora 9’s largest components was Enterprise Security Client (ESC), and wondered why ESC would be so big. After all, a security client should be small - not large.

I just got the answer from Rahul Sundaram of the Fedora project, who asked internally. It turns out that ESC currently includes its own copy of XULRunner. XULRunner essentially provides a library and infrastructure for running “XUL+XPCOM” applications such as Firefox, Thunderbird, and ESC. You can confirm this using the on-line ESC documentation. This is clearly not optimal; as I noted in a previous blog entry, developers should use system libraries, and not create their own local copies. Rahul says that the “the developers are currently working on making it use the system copy[,] which should drop down the size considerably”.

So ESC isn’t really that big - it’s just that ESC creates its own local copy of a massive infrastructure. This is obviously not great for security, since there’s a higher risk that bugs fixed in the real XULRunner would not be fixed in ESC’s local copy. But this appears to be a temporary issue; once Fedora’s version of ESC switches to the system XULRunner, the problem will disappear.

By the way, if you’re interested in the whole “measuring Linux’s size” thing, you should definitely take a look at the past measurements of Debian. My page on counting Source Lines of Code (SLOC) includes links and summaries of that work. It’s neat stuff! My thanks to Jesús M. González-Barahona, Miguel A. Ortuño Pérez, Pedro de las Heras Quirós, José Centeno González, Vicente Matellán Olivera, Juan-José Amor-Iglesias, Gregorio Robles-Martínez, and Israel Herráiz-Tabernero for doing that.

path: /oss | Current Weblog | permanent link to this entry

Wed, 22 Oct 2008

Estimating the Total Development Cost of a Linux Distribution

There’s a new and interesting paper from the Linux Foundation that estimates the total development cost of a Linux distro. Before looking at it, some background would help…

In 2000 and 2001 I published the first estimates of a GNU/Linux distribution’s development costs. The second study (released in 2001, lightly revised in 2002) was titled More than a Gigabuck. That study analyzed Red Hat Linux 7.1 as a representative GNU/Linux distribution, and found that it would cost over $1 billion (over a Gigabuck) to develop this GNU/Linux distribution by conventional proprietary means in the U.S. (in year 2000 U.S. dollars). It included over 30 million physical source lines of code (SLOC), and had it been developed using conventional proprietary means, it would have taken 8,000 person-years of development time to create. My later paper Linux Kernel 2.6: It’s Worth More! focused on how to estimate the development costs for just the Linux kernel (this was picked up by Groklaw).

The Linux Foundation has just re-performed this analysis with Fedora 9, and released it as “Estimating the Total Development Cost of a Linux Distribution”. Here’s their press release. I’d like to thank the authors (Amanda McPherson, Brian Proffitt, and Ron Hale-Evans), because they’ve reported a lot of interesting information.

For example, they found that it would take approximately $10.8 billion to rebuild the Fedora 9 distribution in today’s dollars; it would take $1.4 billion to develop just the Linux kernel alone. This isn’t the value of the distribution; typically people won’t write software unless the software had more value to them than what it cost them (in time and effort) to write it. They state that quite clearly in the paper; they note that these numbers estimate “how much it would cost to develop the software in a Linux distribution today, from scratch. It’s important to note that this estimates the cost but not the value to the greater ecosystem…”. To emphasize that point, the authors reference a 2008 IDC study (“The Role of Linux Commercial Servers and Workloads”) which claims that Linux represents a $25 billion ecosystem. I think IDC’s figure is (in fact) a gross underestimation of the ecosystem value, understandably so (ecosystem value is very hard to measure). Still, the cost to redevelop a system is a plausible lower bound for the value of something (as long as people keep using it). More importantly, it clearly proves that very large and sophisticated systems can be developed as free-libre / open source software (FLOSS).

They make a statement about me that I’d like to expand on: “[Wheeler] concluded—as we did—that Software Lines of Code is the most practical method to determine open source software value since it focuses on the end result and not on per-company or per-developer estimates.” That statement is quite true, but please let me explain why. Directly measuring the amount of time and money spent in development would be, by far, the best way of finding those numbers. But few developers would respond to a survey requesting that information, so direct measurement is completely impractical. Thus, using well-known industry models is the best practical approach to doing so, in spite of their limitations.

I was delighted with their section on the “Limitations and Advantages to this Study’s Approach”. All studies have limitations, and I think it’s much better to acknowledge them than hide them. They note several reasons why this approach grossly underestimates the real effort in developing a distribution, and I quite agree with them. In particular: (1) collaboration often takes additional time (though it often produces better results because you see all sides); (2) deletions are work yet they are not counted; (3) “bake-offs” to determine the best approach (where only the winner is included) produce great results but the additional efforts for the alternatives aren’t included in the estimates. (I noted the bake-off problem in my paper on the Linux kernel.) They note that some drivers aren’t often used, but I don’t see that as a problem; after all, it still took effort to develop them, so it’s valid to include them in an effort estimate. Besides, one challenge to creating an operating system is this very issue - to become useful to many, you must develop a large number of drivers - even though many of the drivers have a relatively small set of users.

This is not a study of “all FLOSS”; many FLOSS programs are not included in Fedora (as they note in their limitations). Others have examined Debian and the Perl CPAN library using my approach (see my page on SLOC), and hopefully someday someone will actually try to measure “all FLOSS” (good luck!!). However, since the Linux Foundation measured a descendent of what I used for my original analysis, it’s valid to examine what’s happened to the size of this single distribution over time. That’s really interesting, because that lets us examine overall trends. So let’s take advantage of that! In terms of physical source lines of code (SLOC) we have:

Distribution         Year   SLOC(million)
Red Hat Linux 6.2    2001    17
Red Hat Linux 7.1    2002    30
Fedora 9             2008   204

If Fedora was growing linearly, the first two points estimate a rate of 13MSLOC/year, and Fedora 9 would have 108 MSLOC (30+6*13). Fedora 9 is almost twice that size, which shows clearly that there’s exponential growth. Even if you factored in the month of release (which I haven’t done), I believe you’d still have clear evidence of exponential growth. This observation is consistent with “The Total Growth of Open Source” by Amit Deshpande and Dirk Riehle (2008), which found that “both the growth rate as well as the absolute amount of source code is best explained using an exponential model”.

Another interesting point: Charles Babcock predicted, in Oct. 19, 2007, that the Linux kernel would be worth $1 billion in the first 100 days of 2009. He correctly predicted that it would pass $1 billion, but it happened somewhat earlier than he thought: by Oct. 2008 it’s already happened, instead of waiting for 2009. I think the reason it happened slightly earlier is that Charles Babcock’s rough estimate was based on a linear approximation (“adding 2,000 lines of code a day”). But these studies all seem to indicate that mature FLOSS programs - including the Linux kernel - are currently growing exponentially, not linearly. Since the rate is also increasing, the date of arrival at $1 billion was sooner than Babcock’s rough estimate. Babcock’s fundamental point - that the Linux kernel keeps adding value at a tremendous pace - is still absolutely correct.

I took a look at some of the detailed data, and some very interesting factors were revealed. By lines of code, here were the largest programs in Fedora 9 (biggest first):

  kernel-2.6.25i686
  OpenOffice.org
  Gcc-4.3.0-2 0080428
  Enterprise Security Client 1.0.1
  eclipse-3.3.2
  Mono-1.9.1
  firefox-3.0
  bigloo3.0b
  gcc-3.4.6-20060404
  ParaView3.2.1

The Linux kernel is no surprise; as I noted in the past, it’s full of drivers, and there’s a continuous stream of new hardware that need drivers. The Linux Foundation decided to count both gcc3 and gcc4; since there was a radical change in approach between gcc3 and gcc4, I think that’s fair in terms of effort estimation. (My tool ignores duplicate files, which helps counter double-counting of effort.) Firefox wasn’t included by name in the Gigabuck study, but Mozilla was, and Firefox is essentially its descendent. It’s unsurprising that Firefox is big; it does a lot of things, and trying to make things “look” simple often takes more code (and effort).

What’s remarkable is that many of the largest programs in Fedora 9 were not even included in the “Gigabuck” study - these are whole new applications that were added to Fedora since that time. These largest programs not in the Gigabuck study are: OpenOffice.org (an office suite, aka word processor, spreadsheet, presentation, and so on), Enterprise Security Client, eclipse (a development environment), Mono (an implementation of the C# programming language and its underlying “.NET” environment), bigloo (an implementation of the Scheme programming language), and paraview (a data analysis and visualization application for large datasets). OpenOffice.org’s size is no surprise; it does a lot. I’m a little concerned that “Enterprise Security Client” is so huge - a security client should be small, not big, so that you can analyze it thoroughly for trustworthiness. Perhaps someone will analyze that program further to see why this is so, and if that’s a reason to be concerned.

Anyway, take a look at “Estimating the Total Development Cost of a Linux Distribution”. It conclusively shows that large and useful systems can be developed as FLOSS.

An interesting coincidence: Someone else (Heise) almost simultaneously released a study of just the Linux kernel, again using SLOCCount. Kernel Log: More than 10 million lines of Linux source files notes that the Linux kernel version 2.6.27 has 6,399,191 SLOC. “More than half of the lines are part of hardware drivers; the second largest chunk is the arch/ directory which contains the source code of the various architectures supported by Linux.” In that code, “96.4 per cent of the code is written in C and 3.3 percent in Assembler”. They didn’t apply the corrective factors specific to Linux kernels that I discussed in Linux Kernel 2.6: It’s Worth More!, but it’s still interesting to see. And their conclusion is inarguable: “There is no end in sight for kernel growth which has been ongoing in the Linux 2.6 series for several years - with every new version, the kernel hackers extend the Linux kernel further to include new functions and drivers, improving the hardware support or making it more flexible, better or faster.”

path: /oss | Current Weblog | permanent link to this entry

Thu, 02 Oct 2008

Play Ogg (Vorbis and Theora)!

The good news: There’s lots of digital audio and video available through the Internet (some free, some pay-for). The bad news: Lots of audio and video is locked up in formats that aren’t open standards. This makes it impractical for people to use them on arbitrary devices, shift the media between devices, and so on. This hurts product developers too; they’ve become vulnerable to massive lawsuits. Even though the MPEG standards are ratified by ISO and are often used - MP3 is particularly common for audio - they are not open standards. In particular, they are subject to a raft of patents, which prevent arbitrary use (e.g., by free-libre / open source software). Things are even worse if you use a format with DRM (aka “Digital Restrictions Management”). DRM tries to arbitrarily restrict how you can use the media you’ve paid for; when the company decides to abandon support for that DRM format, you’ve effectively lost all the money you spent on the audio and video media (examples of DRM abandonment include Microsoft’s MSN Music, Microsoft’s PlaysForSure which is not supported by Microsoft Zune, Yahoo! Music Store, and Walmart’s DRM-encumbered music).

Thankfully, there’s a solution, and that’s Ogg (as maintained by the Xiph.org foundation). Ogg is a “container format” that can contain audio, video, and related material. Audio and video can be encoded inside Ogg using one of several encodings, but usually audio is encoded with “Vorbis” and video is encoded with “Theora”. For perfect sound reproduction, you can use “FLAC” instead of Vorbis (but for most circumstances, Vorbis is the better choice).

I encourage you to use Ogg, and I’m not the only one. Wikipedia requires that audio and video be in Ogg Vorbis and Ogg Theora format (respectively); according to Alexa, Wikipedia is the 8th most popular website in the U.S. (as of Oct 2, 2008). The Free Software Foundation (FSF)’s “Play Ogg” campaign is encouraging the use of Ogg, too. Xiph.org’s 2007 press release and about Xiph explain some of the reasons for preferring Ogg.

So, please seek out and create Ogg files! Their file extensions are easily recognized: “.ogg” (Ogg Vorbis sound), “.oga” (Ogg audio using other codecs like FLAC), and “.ogv” (Ogg video, typically Theora plus Vorbis). If you need to download software to play Ogg files, FSF Ogg’s “how” page or Xiph.org’s home page will explain how to download and install software to play Ogg files (they’re free, in all senses!). Many video players can play Ogg already; among them, VLC (from VideoLAN) is often recommended as a player.

Probably the big news is that the next version of Mozilla’s Firefox will include Ogg - built in! So soon, you can just install Firefox, and you’ll have Ogg support. That should encourage even more use of Ogg, because there will be so many more people who have Ogg (or can get install it easily), as well as lots of reasons to install such software.

If you want more technical details, you can see the Wikipedia article on Ogg. You can also see Internet standard RFC 5334, which discusses the basic file extensions and MIME types, as well as pointing to other technical documents.

Currently there is a babel of formats out there, and most of the more common ones are not open standards. I have no illusions that this babel will instantly disappear, with everyone using Ogg by tomorrow. Getting a new audio or video format used is a difficult chicken-and-egg problem: People don’t want to release audio or video until everyone can play them, and people don’t want to install format players until there’s something to play.

But with Wikipedia, Firefox, and many others all working to encourage the Ogg format, I think the chicken-and-egg problem has been overcome. I’m now discovering all sorts of organizations support Ogg, such as Metavid, (who provide video footage from the U.S. Congress in Ogg Theora format). Groklaw interviewed Richard Hulse of Radio New Zealand, who explained why they recently added support for Ogg Vorbis. Many other radio stations support Ogg; I’ve confirmed support by the Canadian Broadcasting Corporation (CBC) (Radio feeds 1 and 2), WPCE, and WBUR (Xiph.org has a much longer list of stations supporting Ogg). Ogg is widely used in games; there’s Ogg support in the engines for Doom 3, Unreal Tournament 2004, Halo: Combat Evolved, Myst IV: Revelation, Serious Sam: The Second Encounter, Lineage 2, Vendetta Online, and the Grand Theft Auto engines (Xiph.org has a longer list of games). In short, there are now enough Ogg players, and Ogg media, to get the ball rolling.

In particular: Don’t buy a portable audio (music) player unless it can play Ogg Vorbis. Xiph has a list of audio players that support Ogg Vorbis (read the details for the player you’re considering!). If a manufacturer doesn’t support Ogg, complain to them until they fix the problem.

path: /oss | Current Weblog | permanent link to this entry

Fri, 19 Sep 2008

Developers: Use System Libraries!

The packagers from a variety of GNU/Linux distributions are informally uniting to tell software developers a simple story: “Use system libraries - don’t create local copies of libraries!”

The latest push came from Toshio Kuratomi’s email “Uniting to get upstreams to use system libraries”. Fedora, like most distributions, has a guideline that “a package should not link against a local copy of a library… libraries should be included in the system and applications should link against that [instead]”. Toshio lists two reasons why this guideline exists (I know there are other reasons too):

Doing otherwise is a “losing proposition” when trying to fix security issues in a library.
“applications that include their own copies of libraries are often tempted to apply their own bugfixes and feature enhancements to the library. That makes it harder to port the application to new versions of the library and runs counter to the open source philosophy of helping to improve the library for everyone.”

I’m big on security, so reason #1 is a good-enough reason to me. The Fedora packaging rules note that the fixes aren’t actually limited to security issues; not duplicating system libraries “prevents old bugs and security holes from living on after the core system libraries have been fixed.” But I think the more important reason is hinted at in the last part of reason #2. No one - not even a big FLOSS project - has infinite resources. Different people will find different problems when they use a library. If the many different applications that use a library report problems back to the library maintainers, the library maintainers can fix the problem. Then, the fix will benefit everyone who depends on the library. If every application has their own local variant of a library, then each one will have defects that were fixed in other variants.

Toshio then notes: “In the world of C applications and libraries, we don’t often run into this problem anymore. Most C application developers have learned the same lessons we have. However, in the java, mono/.net, and web application worlds, this [duplication of libraries is still] a common practice. Sometimes our packagers find themselves trying to convince upstream to change what they do without success — upstream is convinced that they need to include these local copies.” In some cases (particularly for Java), there were historical reasons that they had to do this due to licensing. But as those reasons have diminished, the practices haven’t gone away.

Fedora, Debian, openSUSE, Gentoo, and Mandriva all have policies/guidelines specifically recommending or requiring that packages not have their own special copies of libraries. All of these distributions clearly explain that applications should use normal libraries instead. Unfortunately, software developers for non-C programs don’t seem to be hearing the message. That makes it really hard to package those programs for use by end-users. As a result, applications are often harder to install, or the easily-installed versions are much delayed, because of unnecessary difficulties in packaging the program for end-users.

Yes, in a few cases a special copy of a library may be necessary. Granted. But it’s often unnecessary, and it should be the exception, not the rule. At the very least, it should be trivial to build a FLOSS application from source code so that it uses the system’s libraries instead of some local copy of the libraries.

So developers, please, try to work with the standard libraries instead of creating your own modified copy. Packagers - and users - around the world will thank you.

path: /oss | Current Weblog | permanent link to this entry

Thu, 21 Aug 2008

Challenges for securing closed source software

I’ve just learned of a really interesting article by Chad Perrin, “10 security challenges facing closed source software”. He starts with my Secure Programming for Linux and Unix HOWTO book’s list of “core requirements for developing secure software”, which was part of the section on developing secure open source software. My list was really simple:

First, people have to actually review the code.
Second, at least some of the people developing and reviewing the code must know how to write secure programs.
Third, once found, problems need to be fixed quickly and their fixes distributed.

Lots of people have cited that list (and the book!), including Google’s “Contributing To Open Source Software Security”.

At the time I made that list, I was primarily thinking about that list as requirements for open source software. Chad Perrin had the interesting insight that the list applies to closed source software too… and then examined what the challenges are. It’s a really interesting list, I suggest taking a look at it! He closes with a very interesting claim: “None of these disadvantages for closed source software are inflexible or absolute. There’s no reason closed source software developed by a corporate vendor can’t be as secure as an open source equivalent. It should be pretty obvious that, all else being equal, the trend is for circumstances to favor the security of open source software — at least as far as these principles of software security are concerned.”

path: /oss | Current Weblog | permanent link to this entry

Wed, 20 Aug 2008

FLOSS License Proliferation: Still a problem

License proliferation in free-libre / open source software (FLOSS) licenses is less than it used to be, but it’s still a serious problem. There are, thankfully, some interesting rumblings to try to make things better.

Russ Nelson at the Open Source Initiative (OSI) wants to restart a FLOSS license anti-proliferation committee to address the problem that there are too many FLOSS licenses. He wants to set up a process to establish two tiers, “recommended” and “compliant”. There’s no telling if the work will be successful, but the basic concept sounds very reasonable to me.

Matt Asay counters that “Someone needs to tell the Open Source Initiative, Google, and others who fret about license proliferation that the market has already cut down the number of actively used licenses to just a small handful: L/GPL, BSD/Apache, MPL, and a few others (EPL, CPL)… It’s a worthy cause, but one that has already been effectively fought and settled by the free market. I would hazard a guess that upwards of 95 percent of all open-source projects are licensed under less than 5 percent of open-source licenses. (The last time I checked, 88 percent of Sourceforge projects were L/GPL or BSD. It’s been a non-issue for many years.) There is no open-source proliferation problem. Do we have a lot of open-source licenses? Yes, just as we have a lot of proprietary licenses (in fact, we have many more of those). But we don’t have a license proliferation problem, because very few open-source licenses actually get used on a regular basis. This is a phantom. It seems scary, but it’s not real.

Asay is right that “the market” has mostly settled the issue, but I think Asay is quite wrong that there is no problem. I quite agree with Asay that there is a very short list of standard FLOSS licenses… but there’s still a lot of people who, even in 2008, keep creating new incompatible FLOSS (or intended to be FLOSS) licenses for their newly-released programs. And although it’s true that “very few actually get used on a regular basis”, it’s also true that a large number of people are still creating new, one-off FLOSS licenses that are incompatible with many widely-used licenses. Why? I think the problem is that there are still a lot of lawyers and developers who “didn’t get the memo” from users and potential co-developers that new FLOSS licenses are decidedly unwelcome. As a result, new programs are still being released under new non-standard licenses.

I can even speculate why there are so many people still creating incompatible licenses, even though users and distributors don’t want them. A lot of new programs are developed by people who know a lot about their technical specialty, but very little about copyright law, and also very little about FLOSS norms (both in licensing and community development processes). So they go to lawyers of their organizations. As far as I can tell, many lawyers think it’s fun to create new licenses and have absolutely no clue that using a nonstandard FLOSS-like license will relegate the program to oblivion. (The primary thing that matters to a lawyer is if they or their organization can be sued; if the license causes the program to be useless, well, too bad, the lawyer still gets paid.) Indeed, many lawyers still don’t even know what the requirements for FLOSS licenses are - never mind that there are license vetting procedures, or that using non-standard FLOSS licenses is widely considered harmful. So we have developers, who know they want to collaborate but don’t realize that they need to follow community standards to make that work, and we have lawyers, who often don’t realize that there are community standards for the licenses (and their non-selection will affect their clients).

Let me give some specific examples from recent work I’m doing, to show that this is still a problem. Right now I’m trying to get some software packaged to more rigorously prove that software does (or doesn’t) do something important. I tried to get CVC3 packaged; it has “almost a BSD license”, and I believe the developer intended for it to be FLOSS. Problem is, somebody thought it’d be fun to add some new nonstandard clauses. The worst clause - and I’m highly paraphrasing here - could be interpreted as, “If we developers did lots of illegal activities in creating the software, you’re required to pay for our legal expenses to defend our illegal activities, even if the only thing that you did is provide copies of this software to other people, or used it incidentally.” Certainly that’s how I interpret it, though I’m no lawyer. When I brought this license text to Fedora legal, let’s just say that they were less than enthused about endorsing this license or including the program in the distribution. Indeed, CVC3’s license may make it too dangerous for anyone to use. After all, how could I possibly determine the risk that you (the developer) did something illegal? CVC3 also has another annoying incompatible license addition (compared to the BSD-new license), a “must change name if you change the code” type clause. Of course, it won’t compile as-is; the only way to compile it is to change the code :-). Here’s hoping that they fix this by switching to a standard license. CVC3 is not the only offender, either, there are legions of them. I examined Alt-Ergo, a somewhat similar program. It uses a FLOSS license, but it uses the remarkably weird and non-standard CeCILL-C license (this is even less well known than its cousin the CeCILL; according to Fedora it’s FLOSS but GPL-incompatible, and a GPL-incompatible FLOSS license is a remarkably bad choice). Third example - over this weekend I had a private email conversation with a developer who’s about to release their software with a license; the developer intended to create (as a non-lawyer!) yet another license with incompatible non-FLOSS terms. Which would have been a big mistake.

Frankly, I think Asay is being excessively generous in his list of acceptable licenses. The standard FLOSS licenses are, I believe, simply MIT, revised BSD (BSD-new), LGPL (versions 2.1 and 3), and GPL (versions 2 and 3), and possibly the Apache 2.0 license. All of these licenses have a very large set of projects that use them, are widely understood, have been deeply analyzed by legal experts, and yet are comprehensible to both developers and users. An especially important property of this set, as you can see from my FLOSS license slide, is that they are generally compatible (with the problem that Apache 2.0 and GPLv2 aren’t compatible). Compatibility is critical; if you want to use FLOSS to build serious applications, you often need to combine them in novel ways, and license incompatibilities often prevent that. As I note in Make Your Open Source Software GPL-Compatible. Or Else, the GPL is by far the most popular FLOSS license; most FLOSS software is under the GPL. So choosing a GPL-incompatible license is, in most cases, foolish. Which is a key reason I don’t include the MPL in that set; not only do these licenses have vanishingly small market share compared to the set above, but their incompatibilities make their use foolish. Even Mozilla, the original creator of the MPL, essentially no longer uses the MPL (they tri-license with the GPL/LGPL/MPL, because GPL-incompatibility was a bad idea).

Having a short “OSI recommended” or “FSF recommended” list of licenses is unlikely to completely solve the problem of license proliferation. But having a semi-formal, more obviously endorsed, and easy-to-reference site that identified the short list of recommended licenses, and explained why license proliferation is bad, would help. While those well-versed in FLOSS understand things, the problem is those others who are just starting out to develop a FLOSS project. After all, the license is chosen at the very beginning of a project, when the lead developer may have the least experience with FLOSS. Anyone beginning a new project is likely to make mistakes, but there’s a difference; just about any other mistake in starting a FLOSS project can be fixed fairly easily. Don’t like the CM system? Switch! Don’t like your hosting environment? Move! But a bad license is often extremely difficult to change; it may require agreement by a vast army of people, or those (e.g., organizational lawyers) who have no incentive to cooperate later. Yes, projects like vim and Python have done it, but only with tremendous effort.

The license mistakes of one project can even hurt other projects. Squeak is still trying to transition from early licensing mistakes, and it’s still not done even though it’s been working on it for years. These has impeded the packaging and wider use of nice programs like Scratch, which depend on Squeak. The Java Trap discusses some of the challenges when FLOSS requires proprietary software to run; when the FLOSS licenses are incompatible, many of the same problems apply. In short, when FLOSS licenses are incompatible, they cause problems for everyone. And when there are more than a few FLOSS licenses, it also becomes very hard to understand, keep track of, and comply with them.

Asay and Nelson have no trouble understanding the license proliferation issues; they’ve been analyzing FLOSS for years. But they are not the ones who need this information, anyway. It’s the newcomers - the innovators coming up with the new software ideas, but who don’t fully understand collaborative development and how FLOSS licensing enables it - who need this information. I don’t really mean to pick on Asay in this article; it’s just in this case, I think Asay knows too much, and has forgotten how many people don’t yet understand FLOSS.

Documenting a short list of the “recommended licenses” would be a great boon, because it would help those innovative newcomers to FLOSS approaches avoid one of the costliest mistakes of all: Using a nonstandard license.

path: /oss | Current Weblog | permanent link to this entry

Thu, 14 Aug 2008

Free-Libre/Open Source Software (FLOSS) licenses legally enforceable - and more

The U.S. Court of Appeals for the Federal Circuit has ruled in Jacobsen v. Katzer (August 13, 2008) that Free-Libre/Open Source Software (FLOSS) licenses are legally enforceable. Specifically, it determined that in the U.S. disobeying a FLOSS license is copyright infringement (unless there are other arrangements), and not just a contract violation. This makes it much easier to enforce FLOSS licenses in the United States. It has some other very interesting things to say, too, as I show below.

Frankly, I thought this was a very obvious ruling; I find it bizarre that some people thought there was another possibility (and that this had to be appealed). After all, U.S. copyright law clearly says that the copyright holder can determine the conditions for (most) copying, and doing anything else (unless specially permitted by law) is copyright infringement. This ruling simply states that the law is what it says it is, and that FLOSS licenses are a perfectly valid set of conditions. This eliminates, in one stroke, the argument “is a license a contract or a license?” silliness. A license is, well, a license! I’ve thought it was quite obvious that a license is not a contract; Eben Moglen and Groklaw have both written articles on this that I find extremely persuasive. In some countries, this distinction may make no difference, but in the U.S. there is a big difference. As Andy Updegrove noted, “Under contract law, the remedy is monetary damages, which aren’t likely to amount to anything involving open-source software that is given away…”, but statutory damages (money awarded for a violation of law) “can be awarded for copyright infringement without requiring proof of monetary damages… people can recover attorney fees for copyright infringement cases… [and] most importantly for licenses such as the [GNU General Public License], it means that your rights to use the copyrighted work at all disappear”.

You can find more about the legal implications in Groklaw’s article on Jacobsen v. Katzer, the announcement on Jmri-legal-announce, and LinuxInsider. JMRI has a set of links to related articles.

The court also had many very interesting things to say about FLOSS. I suspect many will quote it because it’s an official U.S. court ruling that cuts to the essense of FLOSS licensing and why it is the way it is. Let me pull out a few interesting quotes; I have bolded some particularly interesting points:

“We consider here the ability of a copyright holder to dedicate certain work to free public use and yet enforce an ‘open source’ copyright license to control the future distribution and modification of that work… Public licenses, often referred to as ‘open source’ licenses, are used by artists, authors, educators, software developers, and scientists who wish to create collaborative projects and to dedicate certain works to the public. Several types of public licenses have been designed to provide creators of copyrighted materials a means to protect and control their copyrights. Creative Commons, one of the amici curiae, provides free copyright licenses to allow parties to dedicate their works to the public or to license certain uses of their works while keeping some rights reserved.”

“Open source licensing has become a widely used method of creative collaboration that serves to advance the arts and sciences in a manner and at a pace that few could have imagined just a few decades ago. For example, the Massachusetts Institute of Technology (‘MIT’) uses a Creative Commons public license for an OpenCourseWare project that licenses all 1800 MIT courses. Other public licenses support the GNU/Linux operating system, the Perl programming language, the Apache web server programs, the Firefox web browser, and a collaborative web-based encyclopedia called Wikipedia. Creative Commons notes that, by some estimates, there are close to 100,000,000 works licensed under various Creative Commons licenses. The Wikimedia Foundation, another of the amici curiae, estimates that the Wikipedia website has more than 75,000 active contributors working on some 9,000,000 articles in more than 250 languages.”

“Open Source software projects invite computer programmers from around the world to view software code and make changes and improvements to it. Through such collaboration, software programs can often be written and debugged faster and at lower cost than if the copyright holder were required to do all of the work independently. In exchange and in consideration for this collaborative work, the copyright holder permits users to copy, modify and distribute the software code subject to conditions that serve to protect downstream users and to keep the code accessible. By requiring that users copy and restate the license and attribution information, a copyright holder can ensure that recipients of the redistributed computer code know the identity of the owner as well as the scope of the license granted by the original owner. The Artistic License in this case also requires that changes to the computer code be tracked so that downstream users know what part of the computer code is the original code created by the copyright holder and what part has been newly added or altered by another collaborator.

“Traditionally, copyright owners sold their copyrighted material in exchange for money. The lack of money changing hands in open source licensing should not be presumed to mean that there is no economic consideration, however. There are substantial benefits, including economic benefits, to the creation and distribution of copyrighted works under public licenses that range far beyond traditional license royalties. For example, program creators may generate market share for their programs by providing certain components free of charge. Similarly, a programmer or company may increase its national or international reputation by incubating open source projects. Improvement to a product can come rapidly and free of charge from an expert not even known to the copyright holder. The Eleventh Circuit has recognized the economic motives inherent in public licenses, even where profit is not immediate…. (Program creator ‘derived value from the distribution [under a public license] because he was able to improve his Software based on suggestions sent by end-users… . It is logical that as the Software improved, more end-users used his Software, thereby increasing [the programmer’s] recognition in his profession and the likelihood that the Software would be improved even further.’).”

“… The conditions set forth in the Artistic License are vital to enable the copyright holder to retain the ability to benefit from the work of downstream users. By requiring that users who modify or distribute the copyrighted material retain the reference to the original source files, downstream users are directed to Jacobsen=s website. Thus, downstream users know about the collaborative effort to improve and expand the SourceForge project once they learn of the ‘upstream’ project from a ‘downstream’ distribution, and they may join in that effort.”

“… Copyright holders who engage in open source licensing have the right to control the modification and distribution of copyrighted material. As the Second Circuit explained in Gilliam v. ABC, 538 F.2d 14, 21 (2d Cir. 1976), the ‘unauthorized editing of the underlying work, if proven, would constitute an infringement of the copyright in that work similar to any other use of a work that exceeded the license granted by the proprietor of the copyright.’ Copyright licenses are designed to support the right to exclude; money damages alone do not support or enforce that right. The choice to exact consideration in the form of compliance with the open source requirements of disclosure and explanation of changes, rather than as a dollar-denominated fee, is entitled to no less legal recognition. Indeed, because a calculation of damages is inherently speculative, these types of license restrictions might well be rendered meaningless absent the ability to enforce through injunctive relief.”

“… The clear language of the Artistic License creates conditions to protect the economic rights at issue in the granting of a public license. These conditions govern the rights to modify and distribute the computer programs and files included in the downloadable software package. The attribution and modification transparency requirements directly serve to drive traffic to the open source incubation page and to inform downstream users of the project, which is a significant economic goal of the copyright holder that the law will enforce. Through this controlled spread of information, the copyright holder gains creative collaborators to the open source project; by requiring that changes made by downstream users be visible to the copyright holder and others, the copyright holder learns about the uses for his software and gains others’ knowledge that can be used to advance future software releases.”

In short, this court ruling makes it clear that FLOSS licenses really are legally enforceable… so it’s safe for businesses to rely on them. It also makes a number of clear statements that FLOSS really does have economic value, even when money doesn’t change hands - a point I make in my article Free-Libre / Open Source Software (FLOSS) is Commercial Software.

path: /oss | Current Weblog | permanent link to this entry

Wed, 16 Jul 2008

Offset 2000 Version Numbers

Linus Torvalds is thinking about changing the Linux kernel version numbering scheme [Kernel Release Numbering Redux]. He said: “I _am_ considering changing just the [version] numbering… because a constantly increasing minor number leads to big numbers. I’m not all that thrilled with ‘26’ as a number: it’s hard to remember… If the version were to be date-based, instead of releasing 2.6.26, maybe we could have 2008.7 instead… I personally don’t have any hugely strong opinions on the numbering. I suspect others do, though, and I’m almost certain that this is an absolutely _perfect_ ‘bikeshed-painting’ subject… let the bike-shed-painting begin.”

Here’s my proposal: Offset 2000 version numbers, i.e., “(y-2000).mm[.dd]”. The first number is the year minus 2000, followed by “.” and a two-digit month, optionally followed by “.” and a two-digit day when there’s more than one release in a month. So version 8.07 would be the first release in July 2008. If you made a later release on July 17, that later release would be 8.07.17 (so if a project makes many releases in a month, you can again determine how old a particular copy is).

Date-based version numbers have a lot going for them, because at a glance you know when it was released (and thus you can determine how old something is). If you choose the ISO order YYYY.MM.DD, the numbers sort very nicely; Debian packages often use YYYYMMDD for versioning. But there’s a problem: full year numbers, or full dates in this format, are annoyingly large. For example, version numbers 2008.07.16 and 20080716 are painfully long version numbers to remember.

So, use dates, but shorten then. Since nothing today can be released before 2000, shorten it by subtracting 2000. Note that this is subtracting - there’s no Y2K-like rollover problem, because the year 2100 becomes 100 and the year 3000 becomes 1000. The second number is the month; using a two-digit month means you don’t have the ambiguity of determining if “2.2” is earlier or later than “2.10” (you would use “2.02” instead). If you need to disambiguate day releases (or you make additional releases in the same month), add “.” and a two-digit day.

These version numbers are short, they’re easy to compare, and they give you a clue about when it was released. Ubuntu already uses this scheme for the first two parts, so this scheme is already in use and familiar to many. This works perfectly with “natural sort” (e.g., with GNOME’s Nautilus file manager or with GNU ls’s “-v” option).

If you use a time-based release system (see this summary of Martin Michlmayr’s thesis for why you would), using this version numbering scheme is easy, and you can even talk about future releases the same way. But what if you release software based on when the features are ready - how, then, can you talk about the system under development? In that case, you can’t easily call it by the version number, since you don’t know it yet. But that’s not really a problem. In many cases, you can just talk about the “development” branch or give a special name to the development branch (e.g., “Rawhide” for Fedora). If you need to distinguish between multiple development branches, just give each of them a name (e.g., “Hardy Heron” for Ubuntu); on release you can announce the version number of a named branch (e.g., “Hardy Heron is 8.04”). This is more-or-less what many people do now, but if a lot of us used the same system, version numbers would have more meaning than they do now.

path: /oss | Current Weblog | permanent link to this entry

Mon, 19 May 2008

YEARFRAC Incompatibilities between Excel 2007 and OOXML (OXML)

In theory, the OOXML (OXML) specification is supposed to define what Excel 2007 reads and writes. In practice, it’s not true at all; the latest public drafts of OOXML are unable to represent many actual Excel 2007 files.

For example, at least 26 Excel financial functions depend on a parameter called “Basis”, which controls how the calendar is interpreted. The YEARFRAC function is a good example of this; it returns the fraction of years between two dates, given a “basis” for interpreting the calendar. Errors in these functions can have large financial stakes.

I’ve posted a new document, YEARFRAC Incompatibilities between Excel 2007 and OOXML (OXML), and the Definitions Actually Used by Excel 2007 ([OpenDocument version]), which shows that the definitions of OOXML and Excel 2007 aren’t the same at all. “This document identifies incompatibilities between the YEARFRAC function, as implemented by Microsoft Excel 2007, compared to how it is defined in the Office Open Extensible Mark-up Language (OOXML), final draft ISO/IEC 29500-1:2008(E) as of 2008-05-01 (aka OXML). It also identifies the apparent definitions used by Excel 2007 for YEARFRAC, which to the author’s knowledge have never been fully documented anywhere. They are not defined in the OOXML specification, because OOXML’s definitions are incompatible with the apparent definition used by Excel 2007.”

“This incompatibility means that, given OOXML’s current definition, OOXML cannot represent any Excel spreadsheet that uses financial functions using “basis” date calculations, such as YEARFRAC, if they use common “basis” values (omitted, 0, 1, or 4). Excel functions that depend upon “basis” date calculations include: ACCRINT, ACCRINTM, AMORDEGRC, AMORLINC, COUPDAYBS, COUPDAYS, COUPDAYSNC, COUPNCD, COUPNUM, COUPPCD, DISC, DURATION, INTRATE, MDURATION, ODDFPRICE, ODDFYIELD, ODDLPRICE, ODDLYIELD, PRICE, PRICEDISC, PRICEMAT, RECEIVED, YEARFRAC, YIELD, YIELDDISC, and YIELDMAT (26 functions).”

I have much more information about YEARFRAC if you want it.

path: /misc | Current Weblog | permanent link to this entry

Thu, 15 May 2008

Oracle letter to Universities: Educate software developers on security/assurance!

I am delighted to point out a really interesting letter to Universities by Mary Ann Davidson, the Chief Security Officer of Oracle Corporation. It basically tells colleges and universities to stop ignoring security, and to instead include software security principles in their computer science curricula. I’m so delighted to see this letter, which has just been released to the public (it had been privately sent to many colleges and universities). Let me point out and comment on some great points in this letter, because I think this letter is really important.

In this letter, she notes that “many security vulnerabilities can be traced to a relatively few types of common coding errors”. I’ve noted that myself, by the way; simply educating developers on what the common (past) mistakes are goes a long way towards eliminating vulnerabilities. She then notes, “most developers we hire have not been adequately trained in basic secure coding principles in their undergraduate or graduate computer science programs.” I agree and think it’s horrific; more on that in a moment. She clarifies that this is a really important problem: “Security flaws are widely recognized as a threat to national security and to the privacy and financial well being of individual citizens, in addition to the costs they impose on us and our customers.” They haven’t just let this be; as they note, “We have therefore had to develop and roll out our own in-house security training program at significant time and expense.” Kudos to Oracle for doing such training, by the way; far too many organizations don’t do that, which explains why software continues to have the same old vulnerabilities as it did 30 years ago. But clearly Oracle cannot train the world, nor it is reasonable to expect that they do so.

She also states that “We believe that the ability to recognize and avoid common errors that can result in catastrophic security failures should be a core part of computer science curricula and that the above measures will foster such change. We strongly recommend that universities adopt secure coding practices as part of their computer science curricula, to improve the security of all commercial software, and ensure that their graduates remain competitive in the job market.” To that I say, Amen.

By itself, that’s great, but here’s the kicker: “In the future, Oracle plans to give hiring preference to students who have received such training and can demonstrate competence in software security principles.” Do you see this? Students at colleges and universities that fail to properly prepare them will be at a competitive disadvantage!

Today, almost all computer science and software engineering graduates will develop software that connects to a network, or must take data from a network… yet almost all are absolutely clueless about how to do so. Not because they don’t know what a “socket” is, but because they don’t know how to counter attacks. And if you’re hooked to a network, or take data from one, you will get attacked.

Yet the education community (with a few wonderful exceptions) still completely ignores the need to educate software developers on how to develop secure software. “It’s not my job” is not just wrong; it’s almost criminal. Society is depending on the educational community to educate students in the fundamentals of what they need to know. Society depends on software, and essentially every student in a software-related field will, after they graduate, write software that will be attacked. Attacks are no longer a surprise - they are a guarantee. Yet the educational system that’s supposed to prepare our developers fundamentally fails to do so. Since attacks are guaranteed, and the students are guaranteed to not know how to counter them, what other results would you expect? The basics of developing secure software should be a mandatory part of computer science and software engineering undergraduate curricula. The vulnerabilities that the students will embed in software, if they do not get this education, will lead to great loss of life and the loss of billions of dollars. Sure, schools already have a lot of material to cover, but practically nothing in a computer science curricula is as important as how to develop secure software; I can think of no other omissions in the CS curricula that cause so much damage. Don’t tell me that you only teach the “fundamentals”; programming languages change, but the need for security will never go away; it is fundamental. I think computer science and software engineering departments that do not explain the basics of developing secure software to all of their undergraduate and graduate students should be shut down, as a menace to society, until they change their ways.

Oh, if you want to see more about this letter, see Mary Ann Davidson’s blog article about it, “The Supply Chain Problem”, where she talks about what led up to the letter, and the follow-on from it: “Last year, I got fed up enough with Oracle having to train otherwise bright and capable CS grads in secure coding 101 that I sent letters to the top 10 or so universities we recruit from (my boss came up with the idea and someone on my team executed on it - teamwork is a wonderful thing)… I am sorry to state that only one of those universities we wrote to responded to my letter… We need a revolution - an upending of the way we think about security -and that means upsetting the supply chain of software developers… To universities, I cannot but contrast the education of engineers with that of computer science majors. Engineers know that their work product must above all be safe, secure and reliable. They are trained to think this way (not pawn off ‘safety’ on ‘testers’) and their curricula builds and reinforces the techniques and mindset of safe, secure and reliable product. (A civil engineer who ignores the principles of basic structures - a core course - in an upper level class is not going to graduate, and can’t dismiss structures as a ‘legacy problem.’)”

I would love to see many organizations banding together to sign a letter like this one. If enough organizations band together, I think many universities and colleges will finally get the message. I would expand it beyond computer science, to any curricula with a significant amount of software development (such as software engineering, MIS, and so on), but that’s a quibble. My goal is not to shut down any departments (I hope that’s clear); it’s to repair a serious omission in our educational system. Kudos to Mary Ann Davidson, for writing the letter and sending it to a number of Universities. When I learned of it, I begged her to please post it publicly. To her great credit, she’s now done so. Thanks, from the bottom of my heart! Now colleges and universities have even fewer reasons to claim the nonsense, “well, no one wants information on developing secure software.” The companies that will hire your students know otherwise.

path: /security | Current Weblog | permanent link to this entry

Wed, 14 May 2008

Defining “open standards”: The Digital Standards Organization (digistan.org)

Lots of people agree that we need “open standards” in information technology. The problem is, there are a lot of snake-oil salesmen who are trying to (re)define that term to mean “whatever proprietary product I’m selling”.

Will we be able to choose what products we use? Will we even be able to exercise our rights (as citizens) at all? These are important questions about our future. The answers to those questions depends on whether or not we have real open standards in place for critical areas of our lives. A vendor who controls critical standards could easily decide that something that is manifestly not in our interest could be in theirs, and force us to submit to their malevolent actions. This is already a concern, and through globalization it will only get worse. We are dependent on information systems, and those who control their standards control those systems… and thus, us. It’s about power; should we have any? This means that understanding what real open standards are about is vital.

In my essay “Is OpenDocument an Open Standard? Yes!”, I addressed this problem of multiple different definitions by finding three widely-used definitions (Perens’, Krechmer’s, and the European Commission’s) and merging them. After all, if a specification meets all three definitions of “open standard”, then it’s far more likely to be a true open standard. Problem is, with all those trees, it’s hard to see the forest.

So I’m delighted to have discovered the Digital Standards Organization (digistan.org). They have a wonderfully brief definition of “open standard”: “a published specification that is immune to vendor capture at all stages in its life-cycle”. That can be a little mystifying, so they also provide a slightly longer definition of “open standard” that clarifies what that means:

“The standard is adopted and will be maintained by a not-for-profit organization, and its ongoing development occurs on the basis of an open decision-making procedure available to all interested parties.
The standard has been published and the standard specification document is available freely. It must be permissible to all to copy, distribute, and use it freely.
The patents possibly present on (parts of) the standard are made irrevocably available on a royalty-free basis.
There are no constraints on the re-use of the standard.

A key defining property is that an open standard is immune to vendor capture at all stages in its life-cycle. Immunity from vendor capture makes it possible to improve upon, trust, and extend an open standard over time.”

That’s a remarkably clear and simple definition, and good definitions are hard! Even better, they have posted a rationale for this definition that cuts through all the noise and nonsense, and instead gets to the heart of the matter. For example, it explains the real goals of open standards: “An open standard must be aimed at creating unrestricted competition between vendors and unrestricted choice for users. Any barrier - including RAND, FRAND, and variants - to vendor competition or user choice is incompatible with the needs of the market at large.” Here’s a quote from the rationale’s abstract, which I think makes a lot of sense:

“Many groups and individuals have provided definitions for ‘open standard’ that reflect their economic interests in the standards process. We see that the fundamental conflict is between vendors who seek to capture markets and raise costs, and the market at large, which seeks freedom and lower costs. There are thus only two types of standard: franchise standards, and open standards. Vendors work hard to turn open standards into franchise standards. They work to change the statutory language so they can cloak franchise standards in the sheep’s clothing of ‘open standard’. Our canonical definition of open standard derives from the conclusion that this conflict lies at the heart of the matter. We define an open standard as ‘a published specification that is immune to vendor capture at all stages in its life-cycle’. A full definition of ‘open standard’ must take into account the direct economic conflict between vendors and the market at large. Such conflicts do not end when a standard is published, so an open standard must also be immune from attack long after it has been widely implemented.”

Digistan is currently asking people to sign “The Hague Declaration” by 2008-05-21. This one states why open standards are important to human liberty, in ways that non-technical people can understand. As Pieter Hintjens argues in his “Open letter to Standards Professionals, Developers, and Activists”, “The Hague Declaration argues that international law and national constitutions of most democracies oblige governments to adopt open standards.” If the text of this letter looks a little like Andrew Updegrove’s A Proposal to Recognize the Special Status of “Civil ICT Standards” or his testimony in Texas, that’s no accident; Andrew Updegrove is one of Digistan’s founders.

Standards are vitally important. If we allow individual companies to control standards, then we have ensured that they will control us - and what we may do - through them. Being a non-profit helps, but even a non-profit’s no guarantee; is the organization interested in maximizing implementation and competition between potential suppliers, or does it have some other motivation (such as maximizing publication revenue)?

I think making standards available at no-charge is no longer a nicety; it is a necessity for a specification to be a truly open standard. When there were only a few standards, and all products were developed by large big-budget corporations, a $100 standard was not a big deal. But today there are a vast array of standards; simply buying “all relevant standards” is becoming prohibitive even for large companies with massive budgets. And those big budgets are increasingly rare; suppliers are often small organizations or individuals collaborating together, or are in countries where those kinds of funds are unavailable. Because the world now includes so many new suppliers, anything that prevents those suppliers from using standards is simply unacceptable. Don’t give me the nonsense that the money is needed to help develop standards; it’s not true. I’ve helped to develop many standards, and I never received a penny from the publication royalties. The IETF, W3C, OASIS, and many other organizations manage to publish their standards, and have for years. The world has changed. In today’s world, “publish” means “freely available over the Internet without having to register for it”; if you can’t Google it, it doesn’t exist. The cost of putting a specification on a public web server is essentially petty cash, and not doing so means that many (if not most) of the specification’s potential users cannot use it.

Open standards and free-libre / open source software (FLOSS) are not the same thing - not at all! There are some similarities, though. From a customer’s point of view, both open standards and FLOSS are strategies for enabling supplier switching (by preventing lock-in). In addition, customers often don’t switch to a FLOSS product, even it’s technologically superior or has lower total costs, solely because the customer is locked into an existing product due to proprietary standards (in data formats, APIs, and so on). You can choose to use open standards and not use FLOSS products, but if you use an open standard, it enables you to select a FLOSS product (now or later).

I believe, very much, in the power of competition to produce lower-cost, higher-quality, and innovative components. But competition is easily stymied through lock-in via “franchise” standards. Open standards are necessary to eliminate lock-in and bring to everyone the advantages of competition: lower cost, higher quality, and greater innovation.

path: /oss | Current Weblog | permanent link to this entry

Fri, 09 May 2008

Bilski: Information is physical!?

The US Court of Appeals for the Federal Circuit in Washington, DC just heard arguments in the Bilski case, where the appellant (Bilski) is arguing that a completely mental process should get a patent. The fact that this was even entertained demonstrates why the patent system has truly descended into new levels of madness. At least the PTO rejected the application; the problem is that the PTO now allows business method patents and software patents. Once they allowed them, there’s no rational way to say “stop! That’s rediculous!” without being arbitrary.

Mr. David Hanson (Webb Law Firm) argued for the appellant (Bilski), and got peppered with questions. “Is a curve ball patentable?”, for example. At the end, he finally asked the court to think of “information as physical”; it is therefore tangible and can be transformed.

That is complete lunacy, and it clearly demonstrates why the patent office is in real trouble.

Information is not physical, it is fundamentally different, and that difference has been understood for centuries. If I give you my car, I no longer have that car. If I give you some information, I still have the information. That is a fundamental difference in information, and always has been. The fact that Bilski’s lawyer can’t understand this difference shows why our patent office is so messed up.

This fundamental difference between information and physical objects was well-understood by the U.S. founding fathers. Here’s what Thomas Jefferson said: “That ideas should freely spread from one to another over the globe, for the moral and mutual instruction of man, and improvement of his condition, seems to have been peculiarly and benevolently designed by nature, when she made them, like fire, expansible over all space, without lessening their density at any point, and like the air in which we breath, move, and have our physical being, incapable of confinement or exclusive appropriation. Inventions then cannot, in nature, be a subject of property.” Thomas Jefferson was a founder, and an inventor. No, they didn’t have computers then, but computers merely automate the processing of information; the essential difference between information and physical/tangible objects was quite clear then.

Our laws need to distinguish between information and physical objects, because they have fundamentally different characteristics.

Basically, by failing to understand the differences, the PTO let in software patents and business method patents, which have been grossly harmful to the United States.

Even if you thought they were merely “neutral”, that’s not enough. There’s a famous English speech about the trade-offs of copyright law, whose principles also apply here: “It is good that authors should be remunerated; and the least exceptionable way of remunerating them is by a monopoly. Yet monopoly is an evil. For the sake of the good we must submit to the evil; but the evil ought not to last a day longer than is necessary for the purpose of securing the good.” - Thomas Babbington Macaulay, speech to the House of Commons, February 5, 1841.

I believe that software patents need to be abolished, pronto. As I’ve discussed elsewhere, software patents harm software innovation, not help it.

But here in the Bilski case we see why some some people have managed to sneak software patents into the patent process. In short, too many people do not understand the fundamental differences between information and physical objects. People whose thinking is that fuzzy are easily duped. Though clearly many people aren’t as confused as Bilski’s lawyer, I think too many people in the patent process have become so confused about the difference between physical objects and information that they don’t understand why software patents are a serious problem. Patents should only apply to processes that directly change physical objects, and their scope should only cover the specifics of those changes. I add that latter part because yes, changing the number on a display does change something physical, but that is irrelevant. If you have a wholly new process for making displays (say, using a new chemical compound), that could be patentable, but changing a “5” to a “6” should not be patentable because “changing a 5 to a 6” is not fundamentally a change in nature. Taking something unpatentable and adding the phrase “doing it with a computer” should not change an unpatentable invention into a patentable one; the Supreme Court understood that, but the PTO still fails to understand that.

I think pharmaceutical companies are afraid of any patent reform laws, because they’re afraid that a change in the patent system might hurt them. But if the patent system isn’t fixed - by eliminating business method patents and software patents - the entire patent system might become too overwhelmed to function, and thus eventually scrapped. I don’t know if pharma patents are more help than hinderance; I’m not an expert in that area. But I make my living with software, and it’s obvious to me (and most other software practitioners) that software patents and business patents are becoming a massive drag on innovation. If we can’t fix the patent system, we’ll have to abolish the patent system completely. A lot of lawyers will be unhappy if the patent system is eliminated, but there are more non-lawyers than lawyers. If the pharma companies want to have a working patent system, then they’ll need to help reign in patents in other areas, or the whole system may collapse.

path: /misc | Current Weblog | permanent link to this entry

Thu, 08 May 2008

Open Source Computer Emergency Response Team (oCERT)

Here’s something new and interesting: the Open Source Computer Emergency Response Team (oCERT). Here’s how they describe themselves: “The oCERT project is a public effort providing security handling support to Open Source projects affected by security incidents or vulnerabilities…”.

They promise to keep things moving. They do permit embargo periods (where vulnerabilities are not publicly disclosing, giving time for developers to fix the problem first). More importantly, though, they have a maximum embargo time of two months; I think that’s great, and important, because a lot of suppliers have abused embargo periods and failed to fix critical vulnerabilities as long as they’re embargoed. These abuses often resulted in customers being exploited through mechanisms that the supplier knew about, but refused to fix in a timely manner.

Google is backing oCERT, which is certainly encouraging. Google even mentions my “three conditions” for securing software (thanks!):

people need to actually review the code
developers/reviewers need to know how to write secure code
once found, security problems need to be fixed quickly, and their fixes distributed quickly

Clearly, something like oCERT could help with these.

This ComputerWorld article on oCERT makes some interesting points. One minor point: They worry that oCERT is using the term “CERT” without permission, but oCERT reports that they do indeed have that permission.

path: /oss | Current Weblog | permanent link to this entry

Tue, 06 May 2008

Securing Open Source Software (OSS)

I’ve just posted my presentation titled “Securing Open Source Software (OSS or FLOSS), which is to be presented at the 8th Semi-Annual Software Assurance Forum, May 6-8, 2008, Sheraton Premiere, Tyson’s Corner in Vienna, Virginia. In it, I discuss how to improve the security of an OSS component by modifying its environment, as well as securing the OSS component itself (by selecting a secure component, building a secure component from scratch, or modifying an existing component). I include a number of examples; they’re necessarily incomplete, but I hope it will help people who are developing or deploying systems. (Here is “Securing Open Source Software (OSS or FLOSS)” in OpenDocument format.) Enjoy!

path: /security | Current Weblog | permanent link to this entry

Fri, 21 Mar 2008

Microsoft Office XML (OOXML) massively defective

Robert Weir has been analyzing Microsoft’s Office XML spec (aka OOXML) to determine how defective it is, with disturbing results.

Most standards today are relatively small, build on other standards, and are developed publicly over time with lots of opportunity for correction. Not OOXML; Emca submitted Office Open XML for “Fast Track” as a massive 6,045 page specification, developed in an absurdly rushed way, behind closed doors, using a process controlled by a single vendor. It’s huge primarily because does everything in a non-standard way, instead of referring to other standards where practical as standards are supposed to do (e.g., for mathematical equations they created their own incompatible format instead of using the MathML standard). All by itself, its failure to build on other standards should have disqualified OOXML, but it was accepted for review anyway, and what happened next was predictable.

No one can seriously review such a massive document in a short time, though ISO tried; ISO’s process did find 3,522 defects. It’s not at all clear that the defects were fixed - there’s been no time to really check, because the process for reviewing the standard simply wasn’t designed to handle that many defects. But even if they were fixed - a doubtful claim - Robert Weir has asked another question, “did they find nearly all of the defects?”. The answer is: Almost all of the original defects remain. By sampling pages, he’s found error after error, none of which were found by the ISO process. The statistics from the sample are very clear: practically all serious errors have not been found. It’s true that good standards sometimes have a few errors left in them, after review, but this isn’t “just a few errors”; these clearly show that the specification is intensely defect-ridden. Less than 2% of the defects have been found, by the data we have so far, which suggests that there are over 172,000 important defects (49x3522) left to find. That’s rediculous.

Want more evidence that it’s defect-ridden? Look at Inigo Surguy’s “Technical review of OOXML”, where he examines just the WordProcessingML section’s 2300 XML examples. He wrote code to check for well-formedness and validation errors, and found that more than 10% (about 300) were in error even given this trivial test. Conclusion? “While a certain number of errors is understandable in any large specification, the sheer volume of errors indicates that the specification has not been through a rigorous technical review before becoming an Ecma standard, and therefore may not be suitable for the fast-track process to becoming an ISO standard.” This did not include the other document sections, and this is a lower bound on accuracy (XML could validate and still be in error). (He also confirmed that Word 2007 does not implement the extensibility requirements of the Ecma specification, so as a result it would be hard to “write an interoperable word processor with Word” using OOXML.)

I think that all by itself, these vast number of errors in OOXML prove that the “Fast Track” process is completely inappropriate for OOXML. The “Fast Track” process was intended to be used when there was already a widely-implemented, industry-accepted standard that had already had its major problems addressed. That’s just not the case here.

These huge error rates were predictable, too. The committee for creating OOXML wasn’t even created until OpenDocument was complete, so they had to do a massive rush job to produce anything. ( Doug Mahugh admitted that “Microsoft… had to rush this standard through.”) They didn’t reuse existing mature standards, so they ended up creating much more work for themselves. Most developers (who could have helped find and fix the defects) stayed away from the Ecma process in the first place; its rules gave one vendor complete control over what was allowed, and there was already a vendor-independent standard in place, which gave most experts no reason to participate. The Ecma process was also almost entirely closed-door (OpenDocument’s mailing lists are public, in contrast), which predictably increased the error rate too.

The GNOME Foundation has been involved in OOXML’s development, and here’s what they say in the GNOME Foundation Annual Report 2007: “The GNOME Foundation’s involvement in ECMA TC45-M (OOXML) was the main discussion point during the last meeting…. [the] Foundation does not support this file format as the main format or as a standard…” I don’t think this is as widely touted as it should be. Here’s an organization directly involved in OOXML development, and it thinks OOXML should not be a standard at all.

India has already voted “no” to OOXML. I hope others do the same. Countries with the appropriate rights have until March 29 to decide. It’s quite plausiable that the final vote will be “no”, and indeed, based on what’s published, it should be “no”. Open Malaysia reported on the March 2008 BRM meeting, for example. It reports that everybody “did their darnest to improve the spec… The final day was absolute mayhem. We had to submit decisions on over 500 items which we hadn’t [had] the time to review. All the important issues which have been worked on repeatedly happened to appear on this final day. So it was non-stop important matters… It was a failure of the Fast Track process, and Ecma for choosing it. It should have been obvious to the administrators that submitting a 6000+ page document which failed the contradiction period, the 5 month ballot vote and poor resolution dispositions, should be pulled from the process. It should have been blatantly obvious that if you force National Bodies to contribute in the BRM and end up not deliberating on over 80% of their concerns, you will make a lot of people very unhappy… judging from the reactions from the National Bodies who truly tried to contribute on a positive manner, without having their concerns heard let alone resolved, they leave the BRM with only one decision in their mind come March 29th. The Fast Tracking process is NOT suitable for ISO/IEC DIS 29500. It will fail yet again. And this time it will be final.”

In my opinion, the OOXML specification should not become an international standard, period. I think it clearly doesn’t meet the criteria for “fast track” - but more importantly, it doesn’t meet the needs for being a standard at all. It completely contradicts the goal of “One standard, one test - Accepted everywhere”, and it simply is not an open standard. I’ve blogged before that having multiple standards for office documents is a terrible idea. There’s nothing wrong with a vendor publishing their internal format; in fact, ISO’s “type 2 technical report” or “ISO agreement” are pre-existing mechanisms for documenting the format of a single vendor and product line specification. But when important data is going to be exchanged between parties, it should be exchanged using an open standard. We already have an open standard for office documents that was developed by consensus and implemented by multiple vendors: OpenDocument (ISO/IEC 26300). For more clarification about what an open standard is, or why OpenDocument is an open standard, see my essay “Is OpenDocument an Open Standard? Yes!” OpenDocument works very well; I use it often. In contrast, it seems clear that OOXML will never be a specification that everyone can fully implement. Its technical problems alone are serious, but even more importantly, the Software Freedom Law Center’s “Microsoft’s Open Specification Promise: No Assurance for GPL” makes it clear that OOXML cannot be legally implemented by anyone using any license. And this matters greatly.

Andy Updegrove calls for recognition of “Civil ICT Standards”, which I think helps puts this technical stuff into a broader and more meaningful context. He notes that in our new “interconnected world, virtually every civic, commercial, and expressive human activity will be fully or partially exercisable only via the Internet, the Web and the applications that are resident on, or interface with, them. And in the third world, the ability to accelerate one’s progress to true equality of opportunity will be mightily dependent on whether one has the financial and other means to lay hold of this great equalizer… [and thus] public policy relating to information and communications technology (ICT) will become as important, if not more, than existing policies that relate to freedom of travel (often now being replaced by virtual experiences), freedom of speech (increasingly expressed on line), freedom of access (affordable broadband or otherwise), and freedom to create (open versus closed systems, the ability to create mashups under Creative Commons licenses, and so on)… This is where standards enter the picture, because standards are where policy and technology touch at the most intimate level. Much as a constitution establishes and balances the basic rights of an individual in civil society, standards codify the points where proprietary technologies touch each other, and where the passage of information is negotiated… what will life be like in the future if Civil ICT Rights are not recognized and protected, as paper and other fixed media disappear, as information becomes available exclusively on line, and as history itself becomes hostage to technology? I would submit that a vote to adopt OOXML would be a step away from, rather than a way to advance towards, a future in which Civil ICT Rights are guaranteed”.

Ms. Geraldine Fraser-Moleketi, Minister of Public Service and Administration, South Africa, gave an interesting presentation at the Idlelo African Conference on FOSS and the Digital Commons. She said, “The adoption of open standards by governments is a critical factor in building interoperable information systems which are open, accessible, fair and which reinforce democratic culture and good governance practices. In South Africa we have a guiding document produced by my department called the Minimum Interoperability Standards for Information Systems in Government (MIOS). The MIOS prescribes the use of open standards for all areas of information interoperability, including, notably, the use of the Open Document Format (ODF) for exchange of office documents… It is unfortunate that the leading vendor of office software, which enjoys considerable dominance in the market, chose not to participate and support ODF in its products, but rather to develop its own competing document standard which is now also awaiting judgement in the ISO process. If it is successful, it is difficult to see how consumers will benefit from these two overlapping ISO standards… The proliferation of multiple standards in this space is confusing and costly.” She also said, “One cannot be in Dakar without being painfully aware of the tragic history of the slave trade… As we find ourselves today in this new era of the globalised Knowledge Economy there are lessons we can and must draw from that earlier era. That a crime against humanity of such monstrous proportions was justified by the need to uphold the property rights of slave owners and traders should certainly make us more than a little cautious about what should and should not be considered suitable for protection as property.”

You can get more detail from the Groklaw ODF-MSOOXML main page, but I think the point is clear. The world doesn’t need the confusion of a specification controlled by a single vendor being labelled as an international standard. NoOOXML has a list of reasons to reject OOXML.

path: /misc | Current Weblog | permanent link to this entry

Twisted Mind of the Security Pro

Bruce Schneier’s “Inside the Twisted Mind of the Security Professional” is highly-recommended reading - he explains the different kind of thinking required to be good at making things secure. Security pros are able to see the bigger picture, and in particular, they are able to see things from from an attacker’s perspective.

For example, “SmartWater is a liquid with a unique identifier linked to a particular owner. ‘The idea is for me to paint this stuff on my valuables as proof of ownership,’ I wrote when I first learned about the idea. ‘I think a better idea would be for me to paint it on your valuables, and then call the police.’” Similarly, on opening up an ant farm, his friend was surprised that the manufacturer would send you ants by mail; Bruce thought it was interesting that “these people will send a tube of live ants to anyone you tell them to.”

Being able to think like an attacker is so important that in my book on writing secure programs, I gave it its own heading: paranoia is a virtue. It’s still true. My thanks to Bruce Schneier for expressing this need so eloquently.

We would live in a better world if all of us could see the world as attackers do - or at least make the effort to try. In particular, we’d stop doing many foolish things in the name of “security”, and instead do things that actually secured our world.

path: /security | Current Weblog | permanent link to this entry

Tue, 11 Mar 2008

OSS and the U.S. DoD - Questions and Answers

I’ve just posted Questions and Answers for 2008 “Open Source Software and DoD” Webinar. These are my attempts to answer the questions people sent me at my February “Open Source Software (OSS) and the U.S. Department of Defense (DoD)” Some of the questions were easy to answer, but some were surprisingly difficult. In some cases, I asked lawyers and got conflicting answers. But this is the best information that I could find on the topic.

For example, I explain in detail why In particular, it appears fairly clear that both the government and government contractors can release their results as open source software under the default DoD contract terms for software development (DFARS contracting clause 252.227-7014):

The government can release software as OSS once it receives “unlimited rights” to it. Unless other arrangements are made, the government has unlimited rights to software components when (1) it pays entirely for their development, or (2) five years after contract signature if it partly paid for their development. Before award, a contractor may identify the components that will have more restrictive rights (e.g., so the government can prefer proposals that give the government more rights). Where possible, software developed partly by government funds should broken into a set of smaller components at the “lowest practicable level” so the rules can be applied separately to each one. Of course, the software can only be released to the public as OSS if other laws are also met (such as classification, export control, patent law, and trademark law).
Normally a DoD contractor can release the software as OSS at any time, since it holds the copyright. This default can be overridden by the contract, e.g., DFARS 252.227-7020 assigns copyright to the government, and is included in some contracts. Again, this release can only occur if other laws are also met (such as classification, export control, patent law, and trademark law).

These are the usual defaults; negotiations can change things, so read the contract to see if the contract changes these defaults. For example, sometimes the government has copyright assigned to it, in which case it can release the software simply because it has the copyright.

I also point out that even when the government isn’t the copyright holder, if it releases software under an OSS license it can still enforce its license. That’s because, even when it’s not the copyright holder, it can still enforce the license… and because the doctrine of unclean hands will impact those who refuse to obey the license.

Several people had questions about software developed by a government employee (which can’t be copyrighted in the U.S.) and how that impacts OSS. The short impact is that there’s no problem; government employees can still contribute to OSS projects, for example. I also discuss some of the export control issues (especially ITAR), and how to address them.

If there are mistakes, please let me know. Thanks!

path: /oss | Current Weblog | permanent link to this entry

Mon, 28 Jan 2008

OSS and the U.S. DoD - Webinar

I’m going to present a webinar on “Open Source Software (OSS) and the U.S. Department of Defense (DoD)” on Feb 11, 2008, 3:00-4:30pm EST. It is open to the public, at no charge. To find out how to sign up, see http://www.dwheeler.com/oss-dod-webinar2008.html.

Here’s the summary: “Open source software (OSS) has become widespread, but there are many misconceptions about it - resulting in numerous missed opportunities. This presentation will clarify what OSS is (and isn’t), rebut common misunderstandings about OSS, discuss the relationship of OSS and security, discuss how to find and evaluate OSS, and explain OSS licensing (including how to combine products and select a license). It will show why nearly all extant OSS is COTS software, and thus why it’s illegal (as well as foolish) to ignore OSS options.”

This presentation is hosted by the Data & Analysis Center for Software (DACS), which is technically managed by the Air Force Research Laboratory - Information Directorate (AFRL/IF).

Please sign up quickly, if you’re interested. There were 45 registrants in the first half hour of its announcement.

path: /oss | Current Weblog | permanent link to this entry