David A. Wheeler's Blog



Fri, 05 Jun 2009

SPARK released as FLOSS (Free/ Libre / Open Source Software)!

The SPARK toolsuite has just been released as FLOSS (Free/ Libre / Open Source Software) by Praxis (its creator). This is great news for those who want to make software safer, more reliable, and more secure. In particular, this means that Tokeneer is now an open proof. If you haven’t been following this, here’s some background.

Software is now a part of really critical systems (ones that need “high assurance”), yet often that software is not as safe, reliable, or secure as it needs to be. I believe that in the long term, we will need to start proving that our very important programs are correct. Testing by itself isn’t enough; completely testing the trivial “add three 64-bit integers” program would take far longer than the age of the universe (it would take about 2x10^39 years). The basic idea of using mathematics to prove that programs are correct — aka “formal methods” — has been around for decades. There are a number of cases where formal methods have been applied successfully, and I’m glad about that. And yet, applying formal methods is still relatively rare. There are many reasons for this, such as inadequate maturation and capabilities of many formal methods tools, and the fact that relatively few people know how to apply formal methods when developing real programs. But what, in turn, is causing those problems? It’s true that applying formal methods is a hard problem that hasn’t received the level of funding it needs, but still, it’s been decades!

I believe one problem hindering the maturation and spread of formal methods is a “culture of secrecy”. Details of formal method use are often unpublished (e.g., because the implementations are proprietary or classified). Similarly, details about formal methods tools are often unshared and lost (or have to constantly re-invented). Biere’s “The Evolution from LIMMAT to NANOSAT” (Apr 2004) gives an example: “From the publications alone, without access to the source code, various details were still unclear... Only [when CHAFF’s source code became available did] our unfortunate design decision became clear... The lesson learned is, that important details are often omitted in publications and can only be extracted from source code. It can be argued, that making source code of SAT solvers available is as important to the advancement of the field as publications”

This “culture of secrecy” means that researchers/toolmakers often don’t receive adequate feedback, researchers/toolmakers waste time and money rebuilding tools, educators have difficulty explaining formal methods (they have no examples to show!), developers don’t understand how to apply it (and it has an uncertain value to them), and evaluators/end-users don’t know what to look for.

I believe that a way to break through this “culture of secrecy” is to develop “open proofs”. But what are they? An “open proof” is software or a system where all of the following are free-libre / open source software (FLOSS):

Something is FLOSS if it gives anyone the freedom to use, study, modify, and redistribute modified and unmodified versions of it, meeting the Free software definition and the open source definition.

Imagine if we had a number of open proofs available. There could be small open proofs that could be used for learning (e.g., as examples and use in class exercises). There could be proofs of various useful functions and small applications, so developers could see how to scale up these techniques, directly reuse them as components, or use them as starting points but add additional (proven) capabilities to them. When problems come up (and they will!), toolmakers and developers could work together to find ways to mature the tools and technology so that they’d be easier to use (e.g., so more could be automated). In short, imagine there was a working ecosystem where researchers/toolmakers/educators, developers of implementations to be proved, and evaluators/end-users could work together by sharing information. I believe that would greatly speed up the maturing of formal methods, resulting in more reliable and secure software.

In this context, Praxis has just released the SPARK GPL Edition. This is their SPARK toolsuite (a formal methods tool) released under the GNU General Public License aka GPL (the most common FLOSS license). So, what’s that?

SPARK is a variant of the Ada programming language, designed to enable proofs about programs (by adding and removing some features of Ada). The additions are in special comments, so SPARK programs can be compiled by a normal Ada compiler like GNAT (which is part of gcc). The Open Proofs page on SPARK has some information on SPARK. The page What is Special About SPARK Contracts? gives a nice quick introduction to SPARK, which I will quote here. It points out that the Ada line:

        procedure Inc (X : in out Integer);
just says there is some procedure “Inc” that may read a value X, and may write it out, but that’s it. In SPARK, you can add much more precise information, and the SPARK tools can then check to see if they are true. For example, if you say this using SPARK:
        procedure Inc (X : in out Integer);
        --# global in out CallCount;
        --# pre  X < Integer'Last and
        --#      CallCount < Integer'Last;
        --# post X = X~ + 1 and 
        --#      CallCount = CallCount~ + 1;
then the SPARK tools will ensure at compile-time (not run-time) that:

You can learn more about SPARK from the book High Integrity Software: The SPARK Approach to Safety and Security” by John Barnes. Sample text of Barnes’ book is available online. The open proofs page on SPARK has more information.

This means that the “Tokeneer” program is now an open proof. Remember, to be an open proof, a program’s implementation, proofs, and required tools have to be open source software. Tokeneer was a sample program written to show how to apply these kinds of techniques to actual systems (instead of trivial 5-line programs). The Tokeneer program itself, and its proofs, have already been released as open source software. Many of the tools it required are already FLOSS (e.g., fuzz and LaTeX for its formal specifications, and an Ada compiler to compile it). Now that SPARK has been released as FLOSS, people can examine this entire stack of software to make improvements in all the technologies, as well as learn from them and create improved implementations. No, this doesn’t suddenly make it trivial to make proofs about complex programs, but it’s a step forward.

If you are interested in making future software better, please help the open proofs project. You don’t need to be a math whiz. For example, if you know how to do shell scripting, please help us package some promising formal methods tools (like SPARK) so they are easy to install. It’s hard to get people to try out these tools (and give feedback) if they’re too hard to install. If you know of formal methods software that is rotting in some warehouse, try to get it released as FLOSS. I think all government-funded unclassified research software should be released as FLOSS by default, since “we the people” paid for it! If you’re interested in the latest software technology, try out a few of these formal methods tools, and release as FLOSS any small programs and proofs you develop with them. Send the toolmakers feedback, or write down their strengths and weaknesses to help others understand them. SPARK is a tool that can be used, right now, in certain circumstances. I have no illusions that today’s formal methods tools are ready for arbitrary 20 million line programs. But if we want future software to be better than today, we need to figure out how to mature formal methods technology and make it better-understood so that it can mature and scale. I think making top-to-bottom worked examples and starting points can help us get there.

path: /oss | Current Weblog | permanent link to this entry

Thu, 28 May 2009

Parchment: Running the Z-machine

I just learned of fun web application called Parchment. Parchment lets you play interactive fiction (I.F., aka "text adventure games") using just your web browser. It only works with I.F. in "Z-machine" format, but that's a very common format.

So go to the parchment site and try out something from their long list of interactive fiction... now you don't need to install anything! That includes my small replayable puzzle "Accuse" (my Accuse source code is already available).

If you want more information about it, here's a brief post about Parchment by its author, Atul Varma. Atul built this based on an existing program, Thomas Thurman's Gnusto. Both are open source software (using the GPLv2 license). Once again, this demonstrates the neat thing about community-developed software; one person developed a program for one circumstance, and another extended it for a different circumstance.

There are several tools available for creating interactive fiction. I've been watching Inform 7 for a while, with interest, because it takes a radically different approach to writing code. Inform 7 is a natural-language programming language that tries to actively exploit features of natural language to make developing these kinds of things easier. You can see a brief Inform 7 tutorial if you're curious, as well as the full Writing with Inform documentation. Inform 7 isn't itself OSS, though significant portions are; inform 6 (a key substrate) and many other portions including the Inform 7 standard rules are released under the Artistic License 2.0. The extensions are released under the "Attribution Creative Commons licence"; that's not normally a license used for software, but I think it'd meet the criteria for OSS, and Fedora approves of this license for content. I hope that someday the rest will be released as OSS as well. The logic behind Inform 7 is described in "Natural Language, Semantic Analysis and Interactive Fiction" by Graham Nelson. If you're interested in some of the technical stuff behind it, the text of the Standard Rules, the text of the extensions, Inform 7 for programmers, and the Chart of Rules can tell you more.

path: /oss | Current Weblog | permanent link to this entry

Sat, 23 May 2009

Wikipedia changes its license

The Wikimedia Foundation (WMF) will change the licensing terms on all its materials — including Wikipedia. Now, all of its existing material will be released under the Creative Commons Attribution-ShareAlike (CC-BY-SA) license in addition to the current GNU Free Documentation License (GFDL). The WMF says “This change is meant to advance the WMF’s mission by increasing the compatibility and availability of free content.” This means that Wikipedia material can now be combined with the vast amount of CC-BY-SA licensed material, and Wikipedia can now include the volumes of CC-BY-SA material (that material will just be CC-BY-SA). It also makes it easier to use Wikipedia material (and other material from the Wikimedia Foundation).

I think this is a good thing overall. Incompatible licenses are a real scourge on community-developed works. Past experience shows that license incompatibility can be a real problem for free-libre/ open source software (FLOSS or OSS), in particular. Bruce Perens warned about FLOSS license incompatibility back in 1999! As I argue in Make Your Open Source Software GPL-Compatible. Or Else, you should release free-libre/ open source software (FLOSS) using a GPL-compatible license. You don’t need to use the GPL, but using a GPL-compatible license (like the MIT, BSD-new, LGPL, or GPL) so means that people can combine your software with other software to create larger works. I show how this works in The Free-Libre / Open Source Software (FLOSS) License Slide, which has a simple graph showing how common FLOSS licenses can work together. Wikipedia articles aren’t software, but the principles still apply - licenses need to enable community-developed works, not disable them.

Now, nothing is perfect. One nice benefit of the GNU Free Documentation License (GFDL) is that it requires that readers be able to get editable versions whose format specification is available to the public (for details, see its text on “transparent” copies). This is a really nice feature of the GFDL; it counters some of the problems of proprietary formats.

The GFDL has many problems, though, when used for short works like Wikipedia articles or images. Most obviously, it requires that you include the entire text of the license with each work (see GFDL 1.3 section 2). That’s no problem for large manuals, which is what the GFDL was designed for, but it’s a big problem for short works. Nobody likes having a license longer than the article it’s attached to! This is one reason why CC-BY-SA is so widely used for short works - and since Wikipedia is primarily a large set of short works, it makes sense. Which is why I (and many others) voted to approve this change.

Now it’s certainly true that people also complain that the GFDL allows the addition of unmodifiable sections. But many GFDL items don’t have them, and Debian determined through a formal vote that “GFDL-licensed works without unmodifiable sections are free [as in freedom]”.

I should also give credit to the Wikimedia Foundation (WMF), Richard Stallman of the FSF, and Lawrence Lessig, who worked together to make this possible.

For more on the Wikimedia license modification, you can see Wikimedia license FAQ, Lawrence Lessig’s post on GFDL 1.3, GFDL 1.3: Wikipedia's exit permit, FDL 1.3 FAQ, and An open response to Chris Frey regarding GFDL 1.3.

path: /oss | Current Weblog | permanent link to this entry

Fri, 22 May 2009

Government-developed Unclassified Software: Default release as Open Source Software

I’d like to see this idea seriously considered and discussed: By default, unclassified software which the government paid to develop should be released to the public as open source software (unless there’s a good reason not to).

Why? Well, If “we the people” paid to develop it, then “we the people” should get it! I think this idea fits into the good government ideal of data transparency; after all, software is data. Currently, we have a lot of waste and unnecessary costs due to loss, re-development, and/or government-created monopolies. The government is not a venture capitalist (VC); people who need a VC should go to a VC.

Let me focus specifically on the United States. I think this idea easily fits into the broader ideas of transparency and open government, including the Memorandum on Transparency and Open Government. Look at all the excitement over data.gov, indeed, Apps for America having a contest to develop software to use data from data.gov.

Indeed, there’s a long history of U.S. laws specifically set up to make data available. Most obviously, Freedom on information act (FOIA) requests make it possible to extract information from the U.S. government. 17 USC 105 and 17 USC 101 prevents the U.S. government from claiming U.S. copyright on a work “prepared by an officer or employee of the United States Government as part of that person’s official duties”. So this idea would be an extension of what’s already gone on.

Let me focus on research, and how this idea could help advance technology. Think of all the advantages if software developed by U.S.-funded research could be reused by other research projects and commercial firms. For example, imagine if other researchers could simply extend previous work by modifying previously-developed software, instead of re-building yet another version from scratch. Anyone could take commercialize the research making it more likely that it would be commercialized instead of being lost in the archives shown at the end of Raiders of the Lost Ark. Some argue that giving sole rights is the only way to commercialization, but that’s just not true; open source software is commercial software, so this is simply a different and fairer path to commercialization. In contrast, the current system inhibits all kinds of technical progress; Biere’s “The Evolution from LIMMAT to NANOSAT” (Apr 2004) found that “important details are often omitted in [research] publications and can only be extracted from source code... [Making source code available] is as important to the advancement of the field as publications”. Originally I thought of this idea for research software, and it’s not hard to see why. But when I starting thinking about the reasons for doing this — especially “if ‘we the people’ paid to develop it, then ‘we the people’ should get it” — then I realized that this principle applies much more broadly.

An open government directive isn’t out yet, but they’re clearly working on it. Please submit this - and other ideas like it - to them. I think there’s a lot of promise, but they can only enact and refine ideas that they’ve heard of. If you like this idea, please vote for it.

If this happened, I envision a two-stage process: (1) release of the software as an archive (so it can be downloaded), and (2) some of it will get picked up and used to start an active OSS project. The second stage might not happen for many years after the first, and that’s okay. Some will ask “how will people find it”, but I think that’s the wrong question. There are many commercial search engines that can find code, but they can only find stuff that’s web-accessible; let’s give them something to find.

Perhaps this should be done in stages. For example, perhaps it'd be best to start with software developed by research. Researchers are supposed to share their results anyway (under most cases), and the lack of software release often inhibits research (e.g., it's harder to check or repeat results). You could then broaden this to other types of software.

I’m sure there will need to be exceptions. There would need to be some sort of guidelines to figure out when to grant those exceptions, and those guidelines should be developed though lively discussion. Most obviously, if it’s a special ingredient necessary for national security, then it should be classified and not revealed in any form. I would not expect weapon systems or intelligence software to be released (though sometimes generic functions developed in them could be released). Export controls would still apply. But the exceptions should be that: Exceptions.

path: /oss | Current Weblog | permanent link to this entry

Mon, 11 May 2009

Wikipedia for childrens' schools

Wikipedia is a cool project. But if you want to hand an encyclopedia to younger children or to schools, Wikipedia is not a great choice. Wikipedia is not “child-safe”, nor is intended to be; it includes a lot of “adult” content. Also, Wikipedia constantly suffers vandalism; the vandalism is often repaired quickly, but that’s little comfort to parents and teachers. There’s also the problem of Internet access; schools typically employ blocking software, and blocking software is fundamentally not smart. Since Wikipedia mixes material that’s okay for children with stuff that is not, Wikipedia often gets blocked by schools for children. Some schools for children just don’t have Internet access at all, for a variety of reasons. All of this makes it hard for such schools to directly use Wikipedia.

Wikipedia for schools is a cool project that compensates for this. It’s a free, hand-checked, non-commercial selection from Wikipedia, targeted around the UK National Curriculum and useful for much of the English speaking world. The current version has about 5500 articles (as much as can be fit on a DVD with good size images) and is “about the size of a twenty volume encyclopaedia (34,000 images and 20 million words)”. It was developed by carefuly selecting for content, then checking for vandalism and suitability by “SOS Children volunteers”. You can download it for free from the website, or as a free 3.5GB DVD.

I also see this as a future model for Wikipedia — allow people to edit, but have a separate vetting process that identifies particular versions of an article as vetted. Then, people can choose if they want to see the latest version or the most recent vetted version. To some, this is very controversial, but I don’t see it that way. A vetting process doesn’t prevent future edits, and it creates a way for people to get what they want... material that they can have increased confidence in. The trick is to develop a good-enough vetting process (or perhaps multiple vetting/rating processes for different purposes). This didn’t make sense back when Wikipedia was first starting (the problem was to get articles written at all!), but now that Wikipedia is more mature, it shouldn’t be surprising that there’s a new need to identify vetted articles. Yes, you have to worry about countries to whom “democracy” is a dirty word, but I think such problems can be resolved. This is hardly a new idea; see Wikimedia’s article on article validation, Wikipedia’s pushing to 1.0, WikiQA by Eloquence, and FlaggedRevs. I am sure that a vetting/validation process will take time to develop, and it will be imperfect... but that doesn’t make it a bad idea.

So anyway, if you know or have younger kids, check out Wikipedia for schools. This is a project that more people should know about.

path: /oss | Current Weblog | permanent link to this entry

Thu, 07 May 2009

FLOSS doubles every 14 months!

I just took a look at Red Hat's 2009 brief to the European Patent Office on why software patents should not be allowed. It's a nice brief, noting that software patents hinder software innovation, and that there is a sound legal basis not to expand availability of such patents in Europe. (Here's Red Hat's press release, and Glyn Moody's comments (ComputerWorld UK) on it).

Their brief points to another paper with very interesting results: "The Total Growth of Open Source" by Amit Deshpande and Dirk Riehle (Proceedings of the Fourth Conference on Open Source Systems (OSS 2008). Springer Verlag, 2008. Page 197-209). In this paper, they analyze the growth of more than 5000 open source software projects, and show that "the total amount of source code as well as the total number of open source projects is growing at an exponential rate." In their conclusion they state that the "total amount of source code and the total number of projects double about every 14 months."

That is an extraordinary rate of growth. Exponential growth can start small, but when it continues it will completely flatten anything not growing exponentially (or growing as fast). This result is consistent with my earlier work, More than a Gigabuck: Estimating GNU/Linux's Size, which also found very rapid growth in free/libre/open source software (FLOSS).

So if you're interested in software trends, take a look at "The Total Growth of Open Source" and Red Hat's brief to the EPO on software patents. I think they're both worth reading.

path: /oss | Current Weblog | permanent link to this entry

Wed, 22 Apr 2009

Why copyright damage limits don't hurt FLOSS

There's a move afoot to argue that copyright infringement penalties should bear a rational relationship to the value of what was infringed. You might think that this could harm Free/Libre/Open Source Software (FLOSS), but I don't think so. Here's why.

First: This is all being brought to a head by the current file-sharing lawsuit against Boston University graduate student Joel Tenenbaum, which raises a number of interesting questions. One issue that I find particularly interesting is the issue of statutory damages: Are fines from $750 to $150,000 per song (worth at most $1), non-commercially shared without permission, even legal under the US Constitution? Or, are these fines so excessive that they are unconstitutional? Ars Technical gives a brief summary of the case, if you haven't been following it. The Free Software Foundation (FSF)'s Amicus Brief in Connection with defendant's motion to dismiss on grounds of unconstitutionality of copyright act statutory damages as applied to infringement of single MP3 files argues that these penalties grossly exceed the crime; the FSF argues that the "State Farm/Gore due process test applicable to punitive damage awards is likewise applicable to statutory damages, and in particular bars the suggestion that each infringement of an MP3 file having a retail value of 99 cents or less may be punishable by statutory damages of from $750 to $150,000 -- or from 2,100 to 425,000 times the actual damages".

Frankly, I think the FSF and Tenenbaum have a reasonable argument on this point. People who shoplift a CD from a store would definitely pay penalties when caught, but those penalties would bear some relationship to the value of the property stolen, and would be far smaller than a file-sharer. This notion that the "punishment should fit the crime" is certainly not new; Proverbs 6:30-31 talks about thieves paying sevenfold if they are caught. That doesn't make such actions right - but unjust penalties aren't right either. I think a lot of the problem is that copyright laws were originally written when only rich people with printing presses could really make and distribute many copies of material. Today, 8-year-olds can distribute as much information as the New York Times, and the law hasn't caught up.

But does the FSF risk subverting Free/Libre/Open Source Software (FLOSS) by making this argument? After all, FLOSS developers also depend on copyright law to enforce certain conditions, and often charge $0 for copies of their software. If the penalties would be limited to "7 times the original cost", would that make FLOSS development impossible?

I don't think there's any problem, but for some people that may not be obvious. The difference is that in a typical music copyright infringement case, the filesharer could purchase the right to do what they're doing for a relatively low price, something typically not true for FLOSS. For example, under normal circumstances it's perfectly legal to buy a song for $1, and then transfer that song to someone else (as long as you destroy your own copies), so sharing that song with 10 people is legal after paying $10.

In contrast, violations of FLOSS licenses often can't be made legal by simply buying the rights. If you violate the revised BSD license by removing all credits to the original author, there's typically no "alternative" legal version available for sale without the author credits. (Indeed, under legal systems with strict "moral rights" it may not even be possible.) Similarly, if you violate the GPL by releasing binary software yet refusing to release its source code, there's often no way to pay additional money to the original authors for that privilege. In some cases, GPL'ed software is released via a dual-use license (e.g., "GPL or proprietary"), with the proprietary version costing additional money; in those cases you do have a value that you can compare against. In cases where there is a value you can compare against, then you should use that value to help determine the penalty. Otherwise, a much stiffer penalty is justified, because there is no method for the infringer to "buy" his or her way out, and their actions risk making functional products (not just entertainment) unsupportable. As noted in the United States Court of Appeals for the Federal Circuit case 2008-1001, JACOBSEN v. KATZER, the court essentially found that failing to obey the conditions of an open source software license led to copyright infringement. (For more on this particular case, see New Open Source Legal Decision: Jacobsen & Katzer and How Model Train Software Will Have an Important Effect on Open Source Licensing.)

So I think that it does make sense to limit copyright penalties based on the value of the original infringed item... but that doing so does not (necessarily) put FLOSS development processes at risk.

path: /oss | Current Weblog | permanent link to this entry

Mon, 20 Apr 2009

Microsoft loses to Open Source Approaches (Encarta vs. Wikipedia)

The competition is over. On one side, we have Microsoft, a company with a market value of about $166 billion (according to a 2009-04-20 NASDAQ quote). On the other side, we have some volunteers who work together and share their results on the web using open source approaches.

And Microsoft lost.

As pointed out by Chris Dawson (ZDNet), Mike Jennings (PC Pro), Naomi Alderman (the Guardian), Noam Cohen (NY Times), Adam Ostrow (Mashable), and many others, Microsoft Encarta (Microsoft’s encyclopedia project) has folded, having failed to compete with Wikipedia. It’s not even hard to see why:

  1. Wikipedia is cheaper than Encarta (no-cost vs. cost)
  2. Wikipedia is easier to start using. If you have a web browser, you have Wikipedia. In contrast, you have to specially install Encarta, and it does not work on all platforms.
  3. Wikipedia is more up-to-date than Encarta. It often took years before Encarta entries got updated, even on trivially obvious issues such as death dates.
  4. Wikipedia has far more material. Wikipedia has far more articles, and generally it has far more material in each article.
  5. Wikpedia’s material has fewer legal restrictions, so users are allowed to do more with Wikipedia results. Creating mash-ups and reposting portions is part of today’s world.

One lesson to be learned here is that it sometimes doesn’t matter how large a company is; changes in technology may mean that they may abandon something in the future. Plan that the future will change, even if a company seems invincible. It’s easy to pick on Microsoft here, but the same can be said of IBM, or Oracle, or anyone else. Tying yourself completely to any one company is, in the long term, a mistake. Thus, you need to have a reasonable escape plan if a company folds or stops supporting a product that you depend on.

Another lesson to be learned here is that proprietary approaches can be beaten by open source approaches. That doesn’t mean it must happen every time, of course. But clearly open source approaches can, at least sometimes, dominate their proprietary competition.

In the long term, it simply doesn’t matter if a company has more money if an open-sourced competitor can produce a better product, make it available at a lower cost, and can sustain that process indefinitely. Given those three factors, proprietary vendors will lose to an open-sourced competitor unless there’s a key differentiator that is sufficiently valuable to users. In such cases, having more money is just an opportunity to lose more money; it gives no benefit of scale. Microsoft’s Encarta team tried to compete by adding special materials (like fancy graphics and sound). I’m sure that Encarta managers convinced themselves that because they were spending money to develop these materials, that users would pay for Encarta instead. They were wrong. In the end, users were more interested in good, timely information than in fancy graphics, and Encarta simply didn’t have a chance. Open source approaches were simply better at providing the encyclopedia people wanted than proprietary approaches were.

The obvious question to me is, are there any lessons that apply to software too? Wikipedia uses free / libre / open source software (FLOSS) principles, but Wikipedia is an encyclopedia not a FLOSS program. Indeed, software is different than encyclopedias in many ways, for example, people can easily switch encyclopedias (while the lock-in and network effects of software are well-known), and far more people can participate in encyclopedia development than in software development. But I still think there are lessons to be learned here. This Encarta vs. Wikipedia battle should make it clear that no proprietary company — no matter how well-resourced it is — is invulnerable to open source competition. Developers of products with FLOSS-like licenses give up some privileges that the law permits them to have, and in return, they can often drastically reduce their development costs and increase the breadth of the result (because the development efforts can be shared among many developers). At a certain point, FLOSS-like projects can end up like a snowball rolling down the hill; they gain so much momentum that even large sums of money — or being the first — aren’t enough to counter them. As a result, even proprietary companies with massive cash resources do not always win. In summary, it doesn’t matter if you have lots of money; if your product costs more and does less (from the user’s point of view), you must change that circumstance, eliminate all competition, or suffer failure of the product.

path: /oss | Current Weblog | permanent link to this entry

Mon, 13 Apr 2009

Releasing FLOSS Software

If you’ve written (or started to write) some Free/Libre/Open Source Software (FLOSS), please follow the time-tested community standards for releasing FLOSS software when you want people to be able to install it from source code. Unfortunately, a lot of people don't seem to be aware of what these conventions are. This really hit me in my recent OpenProofs work; we're trying to make it easy to install programs by pre-packaging them, and we've found that some programs are a nightmare to package or install because their developers did not follow the standard conventions.

So I've released a brief article: Releasing Free/Libre/Open Source Software (FLOSS) for Source Installation, to help people learn about them. For the details, I point to the GNU Coding Standards (especially the release process chapter) and the Software Release Practice HOWTO. I also point out some of the most important conventions that will make building and installing your software much easier for your users:

  1. Pick a good, simple, Google-able name.
  2. Identify the version (using simple version numbers or ISO dates), and include that in the release filename as NAME-VERSION.FORMAT.
  3. Use a standard, widely-used, GPL-compatible FLOSS license — and say so.
  4. Follow good distribution-making practice, in particular, make sure tarballs always unpack into a single new directory named NAME-VERSION.
  5. Use the standard invocation to configure, build, and install it: ./configure; make; make install.
  6. Support the standard ./configure options like --prefix, --exec-prefix, --bindir, --libdir, and so on.
  7. Create a makefile that can rebuild everything and uses makefile variables (including applicable standard makefile variable names and targets).
  8. Have “make install” support DESTDIR.
  9. Document the external tools/libraries needed for building and running, and make it easy to separate/reuse them.
  10. If you patch an external library/tool, get the patch upstream.
  11. Use standard user interfaces. For command line tools, use “-” single-letter options, “--” long-name options, and “--” by itself to signal “no more options”. For GUI tools, provide a .desktop file.

To learn more, see the whole article: Releasing Free/Libre/Open Source Software (FLOSS) for Source Installation.

path: /oss | Current Weblog | permanent link to this entry

Tue, 24 Mar 2009

Fixing Unix/Linux/POSIX Filenames

Traditionally, Unix/Linux/POSIX filenames can be almost any sequence of bytes, and their meaning is unassigned. The only real rules are that "/" is always the directory separator, and that filenames can't contain byte 0 (because this is the terminator). Although this is flexible, this creates many unnecessary problems. In particular, this lack of limitations makes it unnecessarily difficult to write correct programs (enabling many security flaws), makes it impossible to consistently and accurately display filenames, and it confuses users.

So for those of you who understand Unix/Linux/POSIX, I've just released a new technical article, Fixing Unix/Linux/POSIX Filenames.

This article will try to convince you that adding some limitations on legal Unix/Linux/POSIX filenames would be an improvement. Many programs already presume these limitations, the POSIX standard already permits such limitations, and many Unix/Linux filesystems already embed such limitations - so it'd be better to make these (reasonable) assumptions true in the first place. The article discusses, in particular, the problems of control characters in filenames, leading dashes in filenames, the lack of a standard encoding scheme (vs. UTF-8), and special metacharacters in filenames. Spaces in filenames are probably hopeless in general, but resolving some of the other issues will simplify their handling too. This article will then briefly discuss some methods for solving this long-term, though that's not easy - if I've convinced you that this needs improving, I'd like your help figuring out how to do it!

So - take a peek at Fixing Unix/Linux/POSIX Filenames. If you have ideas on how to help, I'd love to know.

path: /oss | Current Weblog | permanent link to this entry

Thu, 26 Feb 2009

2009 UK Action Plan for Open Source Software

A new report from the UK titled Open Source, Open Standards and Re–Use: Government Action Plan is in the news; it's been reported by the BBC, Times Online, and Ars Technica (among many others).

Here's the first paragraph of its foreword: "Open Source has been one of the most significant cultural developments in IT and beyond over the last two decades: it has shown that individuals, working together over the Internet, can create products that rival and sometimes beat those of giant corporations; it has shown how giant corporations themselves, and Governments, can become more innovative, more agile and more cost-effective by building on the fruits of community work; and from its IT base the Open Source movement has given leadership to new thinking about intellectual property rights and the availability of information for re–use by others."

In the policy section, it says that (note the last point):

Remarkable stuff.

path: /oss | Current Weblog | permanent link to this entry

Wed, 11 Feb 2009

Open Proofs: New site and why we need them

There's a new website in town: http://www.openproofs.org. This site exists to define the term "open proofs" and encourage their development. What are open proofs, you ask? Well, let's back up a little...

The world needs secure, accurate, and reliable software - but most software isn't. Testing can find some problems, but testing by itself is inadequate. In fact, it's completely impractical to fully test real programs. For example, completely testing a trivial program that only add three 64-bit numbers, using a trillion superfast computers, would take about 49,700,000,000,000,000,000,000,000,000 years! Real programs, of course, are far more complex.

There is actually an old, well-known approach that can give much more confidence that some software will do what it's supposed to do. These are often called "formal methods", which apply mathematical proof techniques to software. These approaches can produce verified software, where you can prove (given certain assumptions) that the software will (or won't) do something. There's been progress made over the last several decades, but they're not widely used, even where it might make sense to use them. If there's a need, and a technology, why hasn't it matured faster and become more common?

There are many reasons, but I believe that one key problem is that there are relatively few fully-public examples of verified software. Instead, verified software is often highly classified, sensitive, and/or proprietary. Many of the other reasons are actually perpetuated by this. Existing formal methods tools need more maturing, true, but it's rediculously hard for tool developers to mature the tools when few people can show or share meaningful examples. Similarly, software developers who have never used them do not believe such approaches can be used in "real software development" (since there are few examples) and/or can't figure out how to apply them. In addition, they don't have existing verified software that they can build on or modify to fit their needs. Teachers have difficulty explaining them, and students have difficulty learning from them. All of this ends up being self-perpetuating.

I believe one way to help the logjam is to encourage the development of "open proofs". An "open proof" is software or a system where all of the following are free-libre / open source software (FLOSS):

Something is FLOSS if it gives anyone the freedom to use, study, modify, and redistribute modified and unmodified versions of it, meeting the Free software definition and the open source definition.

Open proofs do not solve every possible problem, of course. I don't expect formal methods techologies to become instantly trivial to use just because a few open proofs show up. And formal methods are always subject to limitations, e.g.: (1) the formal specification might be wrong or incomplete for its purpose; (2) the tools might be incorrect; (3) one or more assumptions might be wrong. But they would still be a big improvement from where we are today. Many formal method approaches have historically not scaled up to larger programs, but open proofs may help counter that by enabling tool developers to work with others. In any case, I believe it's worth trying.

So please take a look at: http://www.openproofs.org. For example, for open proofs to be easily created and maintained, we need for FLOSS formal methods tools to be packaged up for common systems so they can be easily installed and used; the web site has a page on the packaging status of various FLOSS tools. Please feel welcome to join us.

path: /oss | Current Weblog | permanent link to this entry

Thu, 22 Jan 2009

Automating DESTDIR for Packaging

Today's users of Linux and Unix systems (including emulation systems like Cygwin) don't want to manually install programs - they want to easily install pre-packaged software. But that means that someone has to create those packages.

When you're creating packages, an annoying step is handling "make install" if the original software developer doesn't support the DESTDIR convention. DESTDIR support is very important, because two of the most common packaging formats - Debian's .deb (used by Debian and Ubuntu) and RPM (used by Fedora, Red Hat, and SuSE/Novell) - both require actions (redirection of writes) that DESTDIR enables. Unfortunately, many software developers don't include support for DESTDIR, and it's sometimes a pain to add DESTDIR support. Indeed, it's often trivial to create packages except for having to make the modifications for DESTDIR support. Yes, adding DESTDIR support isn't hard compared to many other tasks, but since it applies to every program, why not automate this instead?

So, I've written a little essay about Automating DESTDIR for Packaging. In it, I identify some of the ways I've identified for automating DESTDIR. If there are more - great! (Please let me know!). In any case, I'd love to see more automation, so that software will become easier to package and install.

Here's the link, again: Automating DESTDIR for Packaging.

path: /oss | Current Weblog | permanent link to this entry

Mon, 12 Jan 2009

Apple Feedback URL

Oh, quick update - the URL for feedback to Apple is http://www.apple.com/feedback - I gave the wrong URL in my last post. My thanks to Steve Hoelzer, who was the first to send me a correction! Again - please ask them to support Ogg.

path: /oss | Current Weblog | permanent link to this entry

Sat, 10 Jan 2009

Ask Apple to Support Ogg on iPod/iTunes

Please ask Apple to support Ogg on their iPods, iPhones, and iTunes! It wouldn't hurt to also sign this petition (and maybe this one), though I don't know how strongly they'd influence Apple. Here's why, as good news and bad news.

Bad news: Some of the most common formats for audio (like MP3 and AAC) are patent-encumbered, and thus not open standards. Because they're patent-encumbered they are harder and more expensive to support. Many organizations like Wikipedia forbid the use of patent-encumbered standards, and they can't be directly implemented in FLOSS products used in the U.S. and some other countries.

Good news: Ogg (as maintained by the Xiph.org foundation) is available! Ogg is a "container format" that can contain audio, video, and related material using one of several encodings. Usually audio is encoded with "Vorbis" (the combination is "Ogg Vorbis"); perfect sound reproductions can be created with FLAC. This format is already the required audio format for Wikipedia, and the next version of Mozilla's Firefox will include Ogg built in. Many people already have huge music collections in Ogg format, and both many people report that Ogg is an important requirement for a player. See my older blog entry on playing Ogg Vorbis and Theora for more information.

Bad news: Apple's iPods do not directly support Ogg. That's really unfortunate for iPod users, and it also makes it harder to release files in Ogg. So please, ask Apple to add support for Ogg. People have been asking for this for some time, so it's not true that "no one's asking for it". Some people have even taken radical efforts and rewritten the iPod software - but although that shows there's a real interest, that's an extreme measure that normal people shouldn't have to do. There's already software available to Apple to implement Ogg at no charge, and even the original iPods have enough horsepower to implement Ogg. Thus, it will cost Apple very little to add support for Ogg - and there are people who want it.

path: /oss | Current Weblog | permanent link to this entry

Mon, 08 Dec 2008

Use "FLOSS" instead of "FOSS" or "OSS/FS" as Universal term for Open Source Software / Free Software

Below is my brief attempt to untangle some terminology. In short, I suggest using "FLOSS" instead of "FOSS" or "OSS/FS" for software which meets the Free Software Definition and Open Source Definition.

There are many alternative terms for "Free software" in the sense of the Free Software Definition. Examples of such alternatives are "open source software", "libre software", "FOSS" or "F/OSS" (free/open-source software"), OSS/FS (open source software / free software), "freed software", and "unfettered software". Wikipedia's article on alternative terms for free software discusses alternative terminology further.

For someone (like me) who tries to write about software under these kinds of licenses, having multiple different names is annoying. What's worse, the term "Free software" (the original term) is really misleading; people who hear that term presume that you mean "no cost", which is not related to the intended meaning of "freedom". Yes, I know that the "Free" means "freedom" (aka "Free as in speech" or "Free market"), and I'm well aware that you can charge and pay for Free software. But you have to re-teach everyone who knows English, and you're fighting a losing battle against search engines (which will mix results together when you search for a phrase with two common meanings). Even the FSF admits that the term "Free software" is widely misunderstood. Years ago I suggested to Richard Stallman that he use the term "freed software" instead of "free software", so that the term would be different but the acronyms could stay the same; obviously he didn't accept that suggestion.

The term "open source software" is the most widely used term in English, and the term's creators intentionally tried to include everyone regardless of their motivations. I'm happy to use the term "open source software"; I think it's a reasonable term and it's widely accepted. So, in groups which already use that term, I'll gladly use "open source software" as the "universal" term that covers all such software, regardless of the motivations of the developers.

Unfortunately, many of the developers of such software strongly object to the term "open source software" as the universal term. Their objection is that many people who use the term "open source software" only emphasize engineering or business advantages, while the FSF emphasizes freedom for end-users and objects to a term that doesn't note that possibility somehow.

This objection causes problems for people like me. I'm usually not trying to exclude those who object to the term "open source software". Instead, I often want an inclusive term to describe such software, regardless of the motivations of its developers. Different developers often have different motivations - and even the same developer may have different motivations over time or on different projects. So, is there some term that most can accept?

One inclusive term is "OSS/FS", which you can blame me for. I starting writing about open source software / free software (OSS/FS) many years ago, when such writings were much less common. So for example, look at the title of my massive paper "Why open source software / free software (OSS/FS)? Look at the Numbers!" At the time, there wasn't an obvious "universal" term, so I chose to use "OSS/FS", which was an obvious way to combine the two most common terms. "OSS/FS" takes too long to pronounce, though, so it hasn't really caught on.

Among the other terms, "FLOSS" (Free/Libre/Open-Source Software) seems to have won the popularity contest as a "universal" English term that nearly all can accept. Google reports the approximate number of pages a phrase will return, so it's a reasonable way to determine how popular a phrase is. A quick search on Google (using English on 2008-12-08) shows these popularity figures: "FLOSS software" gets 1,570,000 hits, "Libre software" gets 596,000 hits, "FOSS software" gets 595,000 hits, "OSS/FS" gets 193,000 hits, and "F/OSS software" gets 66,200 hits. Note that FLOSS adds the term "libre"; "libre software" or "livre software" is widely used as the universal term in Romance languages, so adding it helps clarify which sense of "free" is meant. The term "FLOSS" also hides the fact that some people prefer the original spelling "open source software", while others prefer to hyphenate it as "open-source software". In context, "FLOSS" is unlikely to be confused with dental floss. If you're an advocate who objects to that similarity, just imagine that "FLOSS cleans the gunk out" :-).

So, when speaking to a group that already uses "open source software" as the universal term for such software, I'll use "open source software". I have no objection to "open source software" as the universal term; it's a reasonable term, and my primary goal is understanding by my hearers. I don't see a problem with using separate terms to describe people's motivations, as opposed to terms for the software that they (co-)develop. Thus, I can glibly say "open source software is co-developed by people and organizations who often have different motivations to do so; those who develop it to promote freedom for end-users typically term their members the 'Free Software Movement', and those who develop it for engineering and business (cost-saving) reasons may be referred to as the 'Open Source Movement'". I think that's perfectly acceptable as terminology; it's certainly clear that different people and organizations can have different motivations and yet can work together. Many, many people use "open source software" as the universal term for such software.

But not all groups accept the term "open source software" as the universal term for such software, and my writings on my website cater to a variety of people. You can't please everyone, but I'd like to avoid unnecessarily alienating people. At the least, I'd rather people object to the substance of my writing instead of my word choice :-).

So: I suggest using "FLOSS" (Free/Libre/Open-source software) instead of "FOSS" or "F/OSS" or "OSS/FS" as the universal term for such software (in English). It's easy to say, inclusive, and it seems to be the most popular of those alternatives. Many people use "open source software" or "Free software" (with the funny capitalization) as the universal term for such software, and that's fine by me. But for material on my website, I intend to slowly migrate towards FLOSS instead of my older term OSS/FS. I'll leave OSS/FS in a few places (including titles) so that people searching for it will still find it, but at this point, I think the OSS/FS acronym is a fossil. I'll tend to leave "open source software" where I use that term; typically those are written for audiences where it's established practice to use that term.

Strictly speaking, "Free software" in the FSF sense is defined by the Free Software Definition (FSD), and "open source software" is defined by the Open Source Definition (OSD). In practice, a software license will typically either meet, or fail to meet, both definitions. This is hardly surprising; different dictionaries have different descriptions of much simpler words. The Free Software Definition is much simpler and easier to understand, and it gives a better understanding of why an end-user might want such a license. So when describing what FLOSS is, I generally prefer to use the simpler and clearer Free Software Definition. If I'm doing a technical analysis of a software license, I require that both definitions be met. The FSD is better at explaining the overall concept, but the OSD has additional information that helps clarify it. In other words, a software license must meet both definitions, or it's not a FLOSS license.

path: /oss | Current Weblog | permanent link to this entry

Mon, 01 Dec 2008

Tell the FTC: Eliminate software patents

The U.S. Federal Trade Commission (FTC) has announced the first of a possible series of public hearings to "explore the evolving market for intellectual property (IP)". They'll begin December 5, 2008, in Washington, DC.

Please let the FTC know that the U.S. should eliminate software patents! The U.S. courts made the mistake of rewriting the laws to permit software patents and business method patents, resulting in a stifling of both competition and innovation. Certainly, software patents have been harming free/libre/open-source software (FLOSS), but they've been harmful to proprietary software development too. The U.S. government is asking for comments - so please let them know about the problems software patents cause, so that we can get rid of them!

To help, I've posted a web page titled Eliminate Software Patents!. This page points to many existing papers and organizations that explain why software patents should be eliminated. And it also lists some ways that, if they can't eliminate the mistake, to at least reduce the damage of the mistake.

I think such comments would be very much in line with their series. They state that "The patent system has experienced significant change since the FTC released its first IP Report in October 2003, and more changes are under consideration... [changes include] decisions on injunctive relief, patentability, and licensing issues... [and] there is new learning regarding the operation of the patent system and its contribution to innovation and competition." Even the FTC's original 2003 report noted how many people were opposed to software patents.

Please contact the FTC, and let them know that software patents should be eliminated... and feel free to use any of these resources if you do contact them.

path: /oss | Current Weblog | permanent link to this entry

Tue, 04 Nov 2008

Kudos and Kritiques: 'Automated Code Review Tools for Security' - and OSS

The article "Automated Code Review Tools for Security" (by Gary McGraw) has just been released to the web. Officially, it will be published in IEEE Computer's December 2008 edition (though increasingly this kind of reference feels like an anachronism!). The article is basically a brief introduction to automated code review. Here are a few kudos and kritiques (sic) of it, including a long discussion about the meaning of "open source software" (OSS) that I think is important to add.

First, a few kudos. The most important thing about this article is that it exists at all. I believe that software developers need to increasingly use static analysis tools to find security vulnerabilities, and articles like this are needed to get the word out. Yes, the current automated review tools for security have all sorts of problems. These tools typically have many false positives (reports that aren't really vulnerabilities), they have many false negatives (they fail to report vulnerabilities they should), they are sometimes difficult to use, and their results are sometimes difficult to understand. But static analysis tools are still necessary. Software is getting larger, not smaller, because people keep increasing their expectations for software functionality. Security is becoming more important to software, not less, as more critical functions depend on the software and most attackers focus on them. Manual review is often too costly to apply (especially on pre-existing software and in proprietary development), and even when done it can miss 'obvious' problems. So in spite of current static analysis tools' problems, the rising size and development speed of software will force many developers to use static analysis tools. There isn't much of a practical alternative - so let's face that! Some organizations (such as Safecode.org) don't adequately emphasize this need for static analysis tools, so I'm glad to see the need for static analysis tools is being emphasized here and elsewhere.

I'm especially glad for some of the points he makes. He notes that "security is not yet a standard part of the security curriculum", and that "most programming languages were not designed with security in mind.. [leading to] common and often exploited vulnerabilities." Absolutely true, and I'm glad he notes it.

There are a number of issues and details the article doesn't cover. For example, he mentions that "major vendors in this space include Coverity, Fortify, and Ounce Labs"; I would have also included Klocwork (for Insight), and I would have also noted at least the open source software program splint. (Proprietary tools can analyze and improve OSS programs, by the way; Coverity's 'open source quality', and Fortify's 'Java open review' projects specifically work to review many OSS programs.) But since this is a simple introductory article, he has to omit much, so such omissions are understandable. I believe the article's main point was to explain briefly what static analysis tools were, to encourage people to look into them; from that vantage point, it does the job. If you already understand these tools, you already know what's in the article; this is an article for people who are not familiar with them.

Now, a few kritiques. (The standard spelling is "critiques", but I can't resist using a funny spelling to match "kudos".) First, the article conflates "static analysis" with "source code analysis", a problem also noted on Lambda the ultimate. The text says "static analysis tools - also called source code analyzers" - but this is not true. There are two kinds of static analysis tools: (1) Source analysis tools, and (2) binary/bytecode analysis tools. There are already several static analysis tools that work on binary or bytecode, and I expect to see more in the future. Later in the text he notes that binary analysis is possible "theoretically", but it's not theoretical - people do that, right now. Yes, it's true that source code analysis is more mature/common, but don't ignore binary analysis. Binary analysis can be useful; sometimes you don't have the source code, and even when you do, it can be very useful to directly analyze the binary (because it's the binary, not the source, that is actually run). It's unfortunate that this key distinction was muddied. So, be aware that "static analysis tools" covers the analysis of both source and binary/byte code - and there are advantages of analyzing each.

Second, McGraw says that this is a "relatively young" discipline. That's sort-of correct in a broad sense, but it's more complicated than that, and it's too bad that was glossed over. The basic principles of secure software development are actually quite old; one key paper was Saltzer and Schroeder's 1975 paper which identified key design principles for security. Unfortunately, while security experts often knew how to develop secure software, they tended to not write that information in a form that ordinary application software developers could use. To the best of my knowledge, my book Secure Programming for Linux and Unix HOWTO (1999-2003) was the first book for software developers (not attackers or security specialists) on how to develop application software that can resist attack. And unfortunately, this information is still not taught in high school (when many software developers learn how to write software), nor is it taught in most undergraduate schools.

Third, I would like to correct an error in the article: ITS4 has never been released as open source, even though the article claims it was. The article claims that "When we released ITS4 as an open source tool, our hope was that the world would participate in helping to gather and improve the rule set. Although more than 15,000 people downloaded ITS4 in its first year, we never received even one rule to add to its knowledge base". Is it really true, that open source software doesn't get significant help? Well, we need to know what "open source" means.

What is "open source software"? It's easy to show that the Open Source Initiative (OSI)'s Open Source Definition, or the Free Software Foundation's simpler Free Software Definition, are the usual meanings for the term "open source software". Just search for "open source software" on Google - after all, a Google search will rank how many people point to the site with that term, and how trusted they are. The top ten sites for this term include SourceForge, the Open Source Initiative (including their definition), the Open Source Software Institute, and (surprise!) me (due to my paper Why Open Source Software / Free Software (OSS/FS, FLOSS, or FOSS)? Look at the Numbers!). The top sites tend to agree on the definition of "open source software", e.g., SourceForge specifically requires that licenses meet the OSI's open source definition. I tend to use the Free Software Definition, because it's simpler, but I would also argue that any open source software license would need to meet both definitions to be generally acceptable. Other major sites that are universally acknowledged as supporting open source software all agree on this basic definition. For example, the Debian Free Software Guidelines were the basis of the Open Source Definition, and Fedora's licensing and licensing guidelines reference both OSI and FSF (and make clear that open source software must be licensed in a way that lets anyone use them). Google Code accepts only a limited number of licenses, all of which meet these criteria. The U.S. Department of Defense's 2003 policy latter, "Open Source Software in the DoD" essentially uses the Free Software Definition to define the term; it says open source software "provides everyone the rights to use, modify, and redistribute the source code of the software". In short, the phrase "open source software" has a widely-accepted and specific meaning.

So let's look at ITS4's license. As of November 3, 2008, the ITS4 download site's license hasn't changed from its original Feb 17, 2000 license. It turns out that the ITS4 license clearly forbids many commercial uses. In fact, their license is clearly labelled a "NON-COMMERCIAL LICENSE", and it states that Cigital has "exclusive licensing rights for the technology for commercial purposes." That's a tip-off that there is a problem; as I've explained elsewhere, open source software is commercial software. License section 1(b) says "You may not use the program for commercial purposes under some circumstances. Primarily, the program must not be sold commercially as a separate product, as part of a bigger product or project, or used in third party work-for-hire situations... Companies are permitted to use this program as long as it is not used for revenue-generating purposes....". In section 2, "(a) Distribution of the Program or any work based on the Program by a commercial organization to any third party is prohibited if any payment is made in connection with such distribution, whether directly... or indirectly...". Cigital has a legal right to release software in just about any way it wants to; that's not what I'm trying to point out. What I'm trying to make clear is that there are significant limitations on use and distribution of ITS4, due to its license.

This means that ITS4 is clearly not open source software. The Open Source Definition requires that an open source software license have (#5) No Discrimination Against Persons or Groups, and (#6) No Discrimination Against Fields of Endeavor. The detailed text for #6 even says: "The license must not restrict anyone from making use of the program in a specific field of endeavor. For example, it may not restrict the program from being used in a business, or from being used for genetic research." Similarly, the Free Software Definition requires the "freedom to run the program, for any purpose" (freedom 0). ITS4 might be described as a "source available", "shared source", or "open box" program - but it's not open source. Real open source software licenses permit arbitrary commercial use; they are designed to include commercial users, not exclude them.

It's absurd to complain that "no one helped this open source project" when the project was never an open source project. Indeed, even if you release software with an OSS license, there is no guarantee that you'll get support (through collaboration). No one is obligated to collaborate with you if you release OSS - you have to convince others that they should collaborate. And conversely, you have every legal right to not release under an OSS license. But OSS licenses are typically designed to encourage collaboration; having a license that is not an OSS license unsurprisingly discourages such collaboration. As I point out in my essay "Make Your Open Source Software GPL-Compatible. Or Else", if you want OSS collaboration you really need to pick from one of the few standard GPL-compatible licenses, for reasons I explain further in that essay.

This matters. My tool flawfinder does a similar task to ITS4, but it is open source software (released under the world's most popular OSS license, the GPL). And I did get a lot of collaborative help in developing flawfinder, as you can tell from the Flawfinder ChangeLog. Thus, it is possible to have OSS projects that analyze programs and receive many contributions. Here's a partial list of flawfinder contributors: Jon Nelson, Marius Tomaschewski, Dave Aitel, Adam Lazur, Agustin.Lopez, Andrew Dalgleish, Joerg Beyer, Jose Pedro Oliveira, Jukka A. Ukkonen, Scott Renfro, Sascha Nitsch, Sebastien Tandel, Steve Kemp (lead of the Debian Security Auditing Project), Jared Robinson, Stefan Kost, and Mike Ruscher. That's at least 17 people co-developing the software! Not all of these contributors added to the vulnerability database, but I know that at least Dave Aitel and Jaren Robinson added new rules, Stefan Kost suggested specific new rules (though I don't think he wrote the code for it), and that Agustin Lopez and Christian Biere caused changes in the vulnerability database's reporting information. There may have been more; in a collaborative process, it's sometimes difficult to fully give credit to everyone who deserves it, and I don't have time to go through all of the records to determine the minutia. It'd be a mistake to think that only database improvements matter, anyway; other user-contributed improvements were useful! These include changes that enabled analysis of patch files (so you can limit reporting to the lines that have changed), made the reports clearer, packaged the software (for easy installation), and fixed various bugs.

Obviously, releasing under a true OSS license helped immensely in getting contributions - especially if ITS4 is our point of comparison. For example, since flawfinder is an OSS program, it was easily incorporated into a variety of OSS Linux distributions, making it easier to use. I also suspect that some of the people most interested in using this kind of program were people paid to evaluate programs - but many of these uses were forbidden by the default ITS4 license.

In short, a non-OSS program didn't have much collaborative help, while a similar OSS program got lots of collaborative help... and that is an important lesson. There is a difference between "really being open source software" and "sort of openish but not really". If you look at ITS4 version 1.1.1, its CHANGES file does list a few external contributions, but they are primarily trivial tweaks or portability changes (to get it to work at all). McGraw himself admits that ITS4 didn't get any new new rules in its first year. In contrast, OSS flawfinder added user-created rules within 6 months of its initial release, and over time the OSS program had lots of functionality provided by other co-developers. I don't want McGraw's incorrect comment about "open source" go unchallenged, when there's an important lesson to be learned instead.

That said, my thanks to McGraw for noting flawfinder (and RATS), as well as ITS4. Indeed, my thanks to John Viega, McGraw, and others for developing ITS4 and the corresponding ACSAC paper - I think ITS4 was an important step in getting people to use and develop static analysis tools for security. I also agree with McGraw that deeper analysis of programs is the way of the future. Tools that focus on simple lexical analysis (like ITS4, flawfinder, and RATS) have their uses, but there is much that they cannot do, which is why tool developers are now focusing on deeper analysis approaches. Static analysis tools that examine a program in more detail (like splint) can do much that simple tools (based on lexical analysis) cannot.

Most importantly, McGraw and I agree on the conclusion: "Static analysis for security should be applied regularly as part of any modern software development process." We need to get that word out to where software is really developed. Static analysis not the only thing that people need to do, but it's an important part. Yes, static analysis tools aren't perfect. But static analysis tools can significantly help develop software that has real security.

path: /oss | Current Weblog | permanent link to this entry

Thu, 23 Oct 2008

Solved: Why is ESC so big?

In my post Estimating the Total Development Cost of a Linux Distribution, I noted that one of Fedora 9's largest components was Enterprise Security Client (ESC), and wondered why ESC would be so big. After all, a security client should be small - not large.

I just got the answer from Rahul Sundaram of the Fedora project, who asked internally. It turns out that ESC currently includes its own copy of XULRunner. XULRunner essentially provides a library and infrastructure for running "XUL+XPCOM" applications such as Firefox, Thunderbird, and ESC. You can confirm this using the on-line ESC documentation. This is clearly not optimal; as I noted in a previous blog entry, developers should use system libraries, and not create their own local copies. Rahul says that the "the developers are currently working on making it use the system copy[,] which should drop down the size considerably".

So ESC isn't really that big - it's just that ESC creates its own local copy of a massive infrastructure. This is obviously not great for security, since there's a higher risk that bugs fixed in the real XULRunner would not be fixed in ESC's local copy. But this appears to be a temporary issue; once Fedora's version of ESC switches to the system XULRunner, the problem will disappear.

By the way, if you're interested in the whole "measuring Linux's size" thing, you should definitely take a look at the past measurements of Debian. My page on counting Source Lines of Code (SLOC) includes links and summaries of that work. It's neat stuff! My thanks to Jesús M. González-Barahona, Miguel A. Ortuño Pérez, Pedro de las Heras Quirós, José Centeno González, Vicente Matellán Olivera, Juan-José Amor-Iglesias, Gregorio Robles-Martínez, and Israel Herráiz-Tabernero for doing that.

path: /oss | Current Weblog | permanent link to this entry

Wed, 22 Oct 2008

Estimating the Total Development Cost of a Linux Distribution

There's a new and interesting paper from the Linux Foundation that estimates the total development cost of a Linux distro. Before looking at it, some background would help...

In 2000 and 2001 I published the first estimates of a GNU/Linux distribution's development costs. The second study (released in 2001, lightly revised in 2002) was titled More than a Gigabuck. That study analyzed Red Hat Linux 7.1 as a representative GNU/Linux distribution, and found that it would cost over $1 billion (over a Gigabuck) to develop this GNU/Linux distribution by conventional proprietary means in the U.S. (in year 2000 U.S. dollars). It included over 30 million physical source lines of code (SLOC), and had it been developed using conventional proprietary means, it would have taken 8,000 person-years of development time to create. My later paper Linux Kernel 2.6: It's Worth More! focused on how to estimate the development costs for just the Linux kernel (this was picked up by Groklaw).

The Linux Foundation has just re-performed this analysis with Fedora 9, and released it as "Estimating the Total Development Cost of a Linux Distribution". Here's their press release. I'd like to thank the authors (Amanda McPherson, Brian Proffitt, and Ron Hale-Evans), because they've reported a lot of interesting information.

For example, they found that it would take approximately $10.8 billion to rebuild the Fedora 9 distribution in today's dollars; it would take $1.4 billion to develop just the Linux kernel alone. This isn't the value of the distribution; typically people won't write software unless the software had more value to them than what it cost them (in time and effort) to write it. They state that quite clearly in the paper; they note that these numbers estimate "how much it would cost to develop the software in a Linux distribution today, from scratch. It’s important to note that this estimates the cost but not the value to the greater ecosystem...". To emphasize that point, the authors reference a 2008 IDC study ("The Role of Linux Commercial Servers and Workloads") which claims that Linux represents a $25 billion ecosystem. I think IDC's figure is (in fact) a gross underestimation of the ecosystem value, understandably so (ecosystem value is very hard to measure). Still, the cost to redevelop a system is a plausible lower bound for the value of something (as long as people keep using it). More importantly, it clearly proves that very large and sophisticated systems can be developed as free-libre / open source software (FLOSS).

They make a statement about me that I'd like to expand on: "[Wheeler] concluded—as we did—that Software Lines of Code is the most practical method to determine open source software value since it focuses on the end result and not on per-company or per-developer estimates." That statement is quite true, but please let me explain why. Directly measuring the amount of time and money spent in development would be, by far, the best way of finding those numbers. But few developers would respond to a survey requesting that information, so direct measurement is completely impractical. Thus, using well-known industry models is the best practical approach to doing so, in spite of their limitations.

I was delighted with their section on the "Limitations and Advantages to this Study's Approach". All studies have limitations, and I think it's much better to acknowledge them than hide them. They note several reasons why this approach grossly underestimates the real effort in developing a distribution, and I quite agree with them. In particular: (1) collaboration often takes additional time (though it often produces better results because you see all sides); (2) deletions are work yet they are not counted; (3) "bake-offs" to determine the best approach (where only the winner is included) produce great results but the additional efforts for the alternatives aren't included in the estimates. (I noted the bake-off problem in my paper on the Linux kernel.) They note that some drivers aren't often used, but I don't see that as a problem; after all, it still took effort to develop them, so it's valid to include them in an effort estimate. Besides, one challenge to creating an operating system is this very issue - to become useful to many, you must develop a large number of drivers - even though many of the drivers have a relatively small set of users.

This is not a study of "all FLOSS"; many FLOSS programs are not included in Fedora (as they note in their limitations). Others have examined Debian and the Perl CPAN library using my approach (see my page on SLOC), and hopefully someday someone will actually try to measure "all FLOSS" (good luck!!). However, since the Linux Foundation measured a descendent of what I used for my original analysis, it's valid to examine what's happened to the size of this single distribution over time. That's really interesting, because that lets us examine overall trends. So let's take advantage of that! In terms of physical source lines of code (SLOC) we have:

Distribution         Year   SLOC(million)
Red Hat Linux 6.2    2001    17
Red Hat Linux 7.1    2002    30
Fedora 9             2008   204
If Fedora was growing linearly, the first two points estimate a rate of 13MSLOC/year, and Fedora 9 would have 108 MSLOC (30+6*13). Fedora 9 is almost twice that size, which shows clearly that there's exponential growth. Even if you factored in the month of release (which I haven't done), I believe you'd still have clear evidence of exponential growth. This observation is consistent with "The Total Growth of Open Source" by Amit Deshpande and Dirk Riehle (2008), which found that "both the growth rate as well as the absolute amount of source code is best explained using an exponential model".

Another interesting point: Charles Babcock predicted, in Oct. 19, 2007, that the Linux kernel would be worth $1 billion in the first 100 days of 2009. He correctly predicted that it would pass $1 billion, but it happened somewhat earlier than he thought: by Oct. 2008 it's already happened, instead of waiting for 2009. I think the reason it happened slightly earlier is that Charles Babcock's rough estimate was based on a linear approximation ("adding 2,000 lines of code a day"). But these studies all seem to indicate that mature FLOSS programs - including the Linux kernel - are currently growing exponentially, not linearly. Since the rate is also increasing, the date of arrival at $1 billion was sooner than Babcock's rough estimate. Babcock's fundamental point - that the Linux kernel keeps adding value at a tremendous pace - is still absolutely correct.

I took a look at some of the detailed data, and some very interesting factors were revealed. By lines of code, here were the largest programs in Fedora 9 (biggest first):

  kernel-2.6.25i686
  OpenOffice.org
  Gcc-4.3.0-2 0080428
  Enterprise Security Client 1.0.1
  eclipse-3.3.2
  Mono-1.9.1
  firefox-3.0
  bigloo3.0b
  gcc-3.4.6-20060404
  ParaView3.2.1

The Linux kernel is no surprise; as I noted in the past, it's full of drivers, and there's a continuous stream of new hardware that need drivers. The Linux Foundation decided to count both gcc3 and gcc4; since there was a radical change in approach between gcc3 and gcc4, I think that's fair in terms of effort estimation. (My tool ignores duplicate files, which helps counter double-counting of effort.) Firefox wasn't included by name in the Gigabuck study, but Mozilla was, and Firefox is essentially its descendent. It's unsurprising that Firefox is big; it does a lot of things, and trying to make things "look" simple often takes more code (and effort).

What's remarkable is that many of the largest programs in Fedora 9 were not even included in the "Gigabuck" study - these are whole new applications that were added to Fedora since that time. These largest programs not in the Gigabuck study are: OpenOffice.org (an office suite, aka word processor, spreadsheet, presentation, and so on), Enterprise Security Client, eclipse (a development environment), Mono (an implementation of the C# programming language and its underlying ".NET" environment), bigloo (an implementation of the Scheme programming language), and paraview (a data analysis and visualization application for large datasets). OpenOffice.org's size is no surprise; it does a lot. I'm a little concerned that "Enterprise Security Client" is so huge - a security client should be small, not big, so that you can analyze it thoroughly for trustworthiness. Perhaps someone will analyze that program further to see why this is so, and if that's a reason to be concerned.

Anyway, take a look at "Estimating the Total Development Cost of a Linux Distribution". It conclusively shows that large and useful systems can be developed as FLOSS.

An interesting coincidence: Someone else (Heise) almost simultaneously released a study of just the Linux kernel, again using SLOCCount. Kernel Log: More than 10 million lines of Linux source files notes that the Linux kernel version 2.6.27 has 6,399,191 SLOC. "More than half of the lines are part of hardware drivers; the second largest chunk is the arch/ directory which contains the source code of the various architectures supported by Linux." In that code, "96.4 per cent of the code is written in C and 3.3 percent in Assembler". They didn't apply the corrective factors specific to Linux kernels that I discussed in Linux Kernel 2.6: It's Worth More!, but it's still interesting to see. And their conclusion is inarguable: "There is no end in sight for kernel growth which has been ongoing in the Linux 2.6 series for several years - with every new version, the kernel hackers extend the Linux kernel further to include new functions and drivers, improving the hardware support or making it more flexible, better or faster."

path: /oss | Current Weblog | permanent link to this entry