David A. Wheeler's Blog

Thu, 31 Dec 2009

Moglen on Patents and Bilski

Eben Moglen has a very interesting presentation on patents (including comments on Bilski) that was originally presented on Nov. 2, 2009. Software patents and business method patents have been a disaster for the U.S. and world economy, and he has some interesting things to say about how we got here (and how it could be fixed).

One interesting point he made, which I hadn’t heard before, is that there is a fundamental conflict between the patent system and the Administrative Procedure Act of 1946 (aka the APA). Nearly all of the U.S. government must obey the APA before creating new rules and regulations. According to the APA, U.S. agencies must keep the public informed, provide for public participation in the rulemaking process, establish uniform standards for rulemaking and adjudication, and provide for judicial review. In particular, agencies normally have to perform a cost-benefit analysis.

But the patent system pre-existed the APA. Patents, since they are government-created monopolies, can constrain people in the same ways that any other rule or regulation can. However, the government does not follow the APA to determine if each proposed patent should be granted. Instead, the old patent process was essentially grandfathered in instead, as a special exception to the APA. Because the APA is not considered when examining each patent, no one in government asks the normally-required question “How will each proposed patent be publicly reviewed before it is granted?”. Patents on ideas that are patently obvious are routinely granted, in part because there is no public review before they are granted and because the patent office (by policy) ignores most information available to the public. All because the patent-granting process is not required to enable public participation in the rulemaking process, in this case, the process for permitting the granting of a patent. Also, when examining a patent to determine if it should be granted, no one asks normally-obvious questions like:

Because the patent system predates the APA, all potential harms to society from a patent are completely ignored during the patent examination process. If patents were individually considered as new regulations under the APA, such questions would need to be carefully considered. That’s an interesting point Moglen makes.

It’s my hope that the Supreme Court will clearly stop software patents. We shall see.

path: /oss | Current Weblog | permanent link to this entry

Sun, 13 Dec 2009

U.S. research should be open access

The Office of Science and Technology Policy (OSTP) has launched a “public consultation on Public Access Policy”, to see if research funded by U.S. grants should be made available as open access results. I think this is important — I believe publicly-funded unclassified research should actually be made available to the public.

Historically, the U.S. pays a fortune for research, the results are written up as papers for journals, and then various publishers acquire total rights to these papers and charge exorbitant monopoly fees for them. The result: Most U.S. citizens cannot afford to see the research their taxes pay for.

The basic question here is really straightforward: Should publicly-funded research results be made available directly to the public instead? Or, should private companies continue to gain ownership over publicly-funded results, for either nothing or a tiny fraction of the public’s costs?

A small number of journal publishers and societies strongly want to keep things the way they are, of course. It makes sense from their point of view; everybody likes free (or nearly free) money! Historically, this arrangement was created because it can be expensive to publish and manage paper. However, that rationale has become completely obsolete. Few people want the paper any more — they want the research, on-line, without a paywall. And don’t give me nonsense about the “costs” of peer review. Many journals don’t pay their reviewers (the reviewers do it gratis), and even if they did, the total control they gain is still unjustified; the U.S. government spends far more per paper than they do for review.

The current sequestering of research is not good for science or the country. I’m currently reading the interesting book “Are We Rome?” by Cullen Murphy, and I can’t help but see some parallels. Chapter 3 is all about “when public good meets private opportunity”. Private organizations may pay for private research, and then keep their results private. But when the public pays for research, it should be shocking if it does not get released back to the public. And by “released back”, I mean released back at no fee at all.

So who will pay for the printing, complex peer review, storage, and fancy indexing of these research results? I think the very question shows a failure to understand current technology, but let’s answer it anyway. Most peer review isn’t paid-for anyway, and if it is, it’s a tiny cost compared to the research itself. Storage? Don’t make me laugh; for $100 I can buy storage for the all of the U.S. research papers for a year. Indexing? The government shouldn’t be doing serious indexing at all!! Just put it on a government site with a basic form filled out (title, authors, date, keywords, abstract, and a link to the actual paper on the government site). If it’s not behind a paywall, the many commercial search systems will index it for you.

I do think there should be a centralized government repository of such papers. If it’s distributed, then papers could be lost without anyone knowing it. I think they should be freely redistributable, so others can copy what they want, but a centralized repository would make sure that we keep all of them available forever. Also, bandwidth costs can be reduced by scale. There’s a risk that they all get lost at once, but it’s easier to copy everything if there’s one place to start from. If it’s a complicated site, then they’ve done it wrong…. for each paper there should be a simple “summary” page with title, authors, etc., and the actual paper itself.

OSTP cites the experience of NIH; NIH did wonderful work for releasing as open access, and in my mind the real problems are that they didn’t go far enough. First, NIH has a one-year embargo… if I already paid for it (and I did), why should wealthy people and organizations get the results first? Second, NIH only considers the actual papers, not the data and software programs that support the works… yet often those are more important. If they were funded by the public, then the public should get them (unless they’re classified, of course, but then they shouldn’t be released at all). I’m sure there are complications and exceptions, but a “default open access” policy would go a long way.

So please, tell the OSTP that the U.S. should release government-funded research as open access publications, available to anyone on the Internet without a paywall. In short, if “we the people” paid for it, then “we the people” should get it. For more information, see this Request for Information (RFI)

path: /oss | Current Weblog | permanent link to this entry

Sun, 29 Nov 2009

Success on Fully Countering Trusting Trust through Diverse Double-Compiling

My November 23 public defense of Fully Countering Trusting Trust through Diverse Double-Compiling went well. This was my 2009 PhD dissertation that expands on how to counter the “trusting trust” attack by using the “Diverse Double-Compiling” (DDC) technique.

Most importantly (to me), my PhD committee agreed that I successfully defended my dissertation. Whew! As a result, I’m essentially done with my PhD.

I learned a lot about creating formal proofs using computers by doing this dissertation. I wanted to give the strongest possible evidence that DDC counters the trusting trust attack, and formal proofs are the strongest form of proof that I know of… which is why I created them. Frankly, creating proofs was kind of fun once I knew what I was doing, but getting there was more painful than it needed to be. Many books are on the underlying mathematics (e.g., giving you extreme detail about various logic systems)… which is great if you’re a mathematician, but not so helpful if you are simply trying to use the mathematics. Some books explain how to do things by hand, but that is an unnecessary amount of pain; one of my proofs is 30 steps long, and I sure wouldn’t have wanted to create that by hand. Some books seemed to assume that you already knew everything the book covered, which is an odd assumption to me :-).

Here’s a trivial example: Most logic systems can prove anything if you give them inconsistent assumptions. That’s bad! You can get rid of that problem by sending the assumptions to a model-builder like mace4… if it can create a model, then the assumptions are consistent. So, make sure you send your assumptions through a model-builder to see if your assumptions are consistent.

I’ve posted detailed data from my dissertation so that people can reproduce my results. I think it’s really important that results be reduceable, otherwise, it’s not science. As part of that data, I’ve included a few files that may help potential proof tool users get started. In particular, I’ve posted prover9 input to prove that Socrates is mortal, a prover9 input to prove that the square root of 2 is irrational, and prover9 input showing how to easily declare that terms in a list are distinct.

The “trusting trust” attack has historically been considered the “uncounterable” attack. Now the attack can be effectively detected — and thus countered.

path: /security | Current Weblog | permanent link to this entry

Fri, 20 Nov 2009

Fully Countering Trusting Trust through Diverse Double-Compiling

A last-minute reminder — my public defense of Fully Countering Trusting Trust through Diverse Double-Compiling is coming up on November 23, 1-3pm. This is my 2009 PhD dissertation that expands on how to counter the “trusting trust” attack by using the “Diverse Double-Compiling” (DDC) technique.

It will be at George Mason University, Fairfax, Virginia, Innovation Hall, room 105. [campus location] [Google map] Anyone is welcome!

I’ve made a few small tweaks over the last few weeks. I modified proof #2 to reduce its requirements even further (making it even easier to do); I had mentioned in text that this was possible, but now the formal proof shows it. I also used mace4 to show that the assumptions of each proof are consistent. Formal proofs aren’t easy to create, or trivial to read, but the reason I went to that trouble is to show that it’s not just my opinion that I’ve countered the trusting trust attack… I want to show, conclusively, that the trusting trust attack has been countered. I know of no stronger method to show that than a formal proof.

The “trusting trust” attack has historically been considered the “uncounterable” attack. Nuts to that. Now the attack can be effectively detected — and thus countered.

path: /security | Current Weblog | permanent link to this entry

Fri, 13 Nov 2009

Trusting Trust, DDC, and Free-Libre/Open Source Software (FLOSS)

As I noted in my blog, I’ve just released my dissertation “Fully Countering Trusting Trust through Diverse Double-Compiling (DDC). But what does that mean for Free-Libre/Open Source Software (FLOSS)? In short, it’s fantastic news for FLOSS, but to explain why that’s so, I need to backtrack first.

The “trusting trust” attack is a nasty computer attack that involves creating a subverted compiler in such a way that it even subverts compilers. It was originally reported in a 1974 security evaluation of Multics, but most people heard about it from Ken Thompson’s 1984 Turing Award presentation (Ken Thompson is a creator of Unix). This attack is incredibly nasty, and what’s worse, until now there’s been no effective countermeasure to it. Indeed, some have claimed that it could not ever be countered, making the whole idea of “computer security” a non-starter.

The “trusting trust” attack appears to be especially devastating to FLOSS. The problem is that with the trusting trust attack, the source code that people review does not correspond to the executable that’s actually running, and that seems to completely torpedo the “many eyes” review that FLOSS makes possible. The whole world could carefully review a program’s source code, but it wouldn’t matter if the compiler turns it undetectably into something malicious.

Thankfully, there is an effective countermeasure, which I’ve named Diverse Double-Compiling (DDC). You can see my dissertation which explains what it is, proves that it works, and even demonstrates it with several compilers including GCC. (I will be giving a public defense of it on November 23, 2009, if you’d like to come.) This means that source code review, such as mass review of FLOSS code, can now actually work.

But there’s more, because there’s an interesting catch with DDC. DDC counters the trusting trust attack, but it’s only useful for people who have access to the compiler source code. Fundamentally, DDC is a technique for determining if a compiler executable corresponds with its source code, but only people who have the source code can apply DDC to see if that’s true. What’s more, only people who have access to the source code will find the statement “the source and executable correspond” particularly useful. (You could use trusted intermediaries, but this requires total trust in those intermediaries, making such claims far weaker than claims that anyone can check.) What’s more, DDC is actually useful beyond what we normally think of as compilers, because you can redefine “compiler” as including other parts (such as the operating system). In that case, you can even show that the system’s executables all correspond to their source code. But you can only use DDC to counter the trusting trust attack if you have access to the source code.

So we now have a radical change. Now that DDC has been shown to work, we can see that software with available source code (including FLOSS) has a fundamental security advantage over other software. That doesn’t mean that all FLOSS is more secure than all proprietary software, of course. But FLOSS already had a general security advantage because it better meets Saltzer & Schroeder’s “Open design principle” (as explained in their 1974-1975 papers). Now we have an attack — the trusting trust attack — for which FLOSS has a fundamental security advantage. The time of ignoring FLOSS options, because of misplaced notions that FLOSS cannot be as secure as proprietary software, needs to come to an end.

path: /oss | Current Weblog | permanent link to this entry

Mon, 02 Nov 2009

New PhD Dissertation: Fully Countering Trusting Trust through Diverse Double-Compiling

An Air Force evaluation of Multics, and Ken Thompson’s Turing award lecture (“Reflections on Trusting Trust”), showed that compilers can be subverted to insert malicious Trojan horses into critical software, including themselves. If this “trusting trust” attack goes undetected, even complete analysis of a system’s source code will not find the malicious code that is running. Previously-known countermeasures have been grossly inadequate. If this attack cannot be countered, attackers can quietly subvert entire classes of computer systems, gaining complete control over financial, infrastructure, military, and/or business system infrastructures worldwide.

Thankfully, there is a countermeasure to the “trusting trust” attack. In 2005 I wrote a paper on Diverse Double-Compiling (DDC), published by ACSAC, where I explained DDC and why it is an effective countermeasure. But some people still raised concerns. Would DDC really counter the attack? Would DDC scale up to real-world compilers? Also, the ACSAC paper required “self-parenting” compilers — can DDC handle compilers that are not self-parenting?

I’m now releasing Fully Countering Trusting Trust through Diverse Double-Compiling, my 2009 PhD dissertation that expands on how to counter the “trusting trust” attack by using the “Diverse Double-Compiling” (DDC) technique. This dissertation was accepted by my PhD committee on October 26, 2009.

On November 23, 2009, 1-3pm, I will be giving a public defense of this dissertation. If you’re interested, please come! It will be at George Mason University, Fairfax, Virginia, Innovation Hall, room 105. [campus location] [Google map]

This dissertation’s thesis is that the trusting trust attack can be detected and effectively countered using the “Diverse Double-Compiling” (DDC) technique, as demonstrated by (1) a formal proof that DDC can determine if source code and generated executable code correspond, (2) a demonstration of DDC with four compilers (a small C compiler, a small Lisp compiler, a small maliciously corrupted Lisp compiler, and a large industrial-strength C compiler, GCC), and (3) a description of approaches for applying DDC in various real-world scenarios. In the DDC technique, source code is compiled twice: once with a second (trusted) compiler (using the source code of the compiler’s parent), and then the compiler source code is compiled using the result of the first compilation. If the result is bit-for-bit identical with the untrusted executable, then the source code accurately represents the executable.

Many people commented on my previous 2005 ACSAC paper on the topic. Bruce Schneier wrote an article on ‘Countering “Trusting Trust”’, which I think is one of the best independent articles describing my work on DDC.

This 2009 dissertation significantly extends my previous 2005 ACSAC paper. For example, I now have a formal proof that DDC is effective (the ACSAC paper only had an informal justification). I also have additional demonstrations, including one with GCC (to show that it scales up) and one with a maliciously corrupted compiler (to show that it really does detect them in the real world). The dissertation is also more general; the ACSAC paper only considered the special case of a “self-parenting” compiler, while the dissertation eliminates that assumption.

So if you’re interested in countering the “trusting trust” attack, please take a look at my work on countering trusting trust through diverse double-compiling (DDC).

path: /security | Current Weblog | permanent link to this entry

Wed, 28 Oct 2009

Notes about the DoD and OSS memo

Yesterday I posted about the new 2009 DoD memo about open source software. I’m delighted to see that the word is getting out. Slashdot, Linux Weekly News, and LXer.com all mentioned the new memo and even pointed to my post. Others are noting the new memo too, including CNet’s Matt Asay, InformationWeek’s J. Nicholas Hoover, InformationWeek’s Serdar Yegulalp, NetworkWorld, and The H. Dan Risacher has posted on Slashdot some background and history for this new 2009 DoD memo. He notes, for example, that “The lawyers were by far the biggest delay” in getting this memo released.

There’s some supporting information for this memo at the DoD Free Open Source Software (FOSS) Communities of Interest (COI) site, which posts the memo itself and a supporting DoD Open Source Software Frequently Asked Questions (FAQ) document.

To help potential users, I’ve updated my presentation Open Source Software (OSS) and the U.S. Department of Defense (DoD), which I hope will clarify some things. I should also remind people about the 2003 MITRE study “Use of Free and Open Source Software (FOSS) in the U.S. Department of Defense”, which showed that in 2003 Free/libre/open source software (FLOSS, FOSS, or OSS) was already widely used in the DoD.

path: /oss | Current Weblog | permanent link to this entry

Tue, 27 Oct 2009

New DoD memo on Open Source Software

The U.S. Department of Defense (DoD) has just released Clarifying Guidance Regarding Open Source Software (OSS), a new official memo about open source software (OSS). This 2009 memo should soon be posted on the list of ASD(NII)/DoD CIO memorandums. This 2009 memo is important for anyone who works with the DoD (including contractors) on software and systems that include software… and I suspect it will influence many other organizations as well. Let me explain why this new memo exists, and what it says.

Back in 2003 the DoD released a formal memo titled Open Source Software (OSS) in the Department of Defense. This older memo was supposed to make it clear that it was fine to use and develop OSS in the DoD. Unfortunately, as the new 2009 memo states, “there have been misconceptions and misinterpretations of the existing laws, policies and regulations that deal with software and apply to OSS that have hampered effective DoD use and development of OSS”.

This new 2009 memo simply explains “the implications and meaning of existing laws, policies and regulations”, hopefully eliminating many of those misconceptions and misinterpretations. A lot of the “meat” is in the Attachment 2, section 2 (guidance), so let’s walk through that:

But perhaps most important is this memo’s opening statement: “To effectively achieve its missions, the Department of Defense must develop and update its software-based capabilities faster than ever, to anticipate new threats and respond to continuously changing requirements. The use of Open Source Software (OSS) can provide advantages in this regard…”. As with the later part (b), here we have an official government document acknowledging that OSS can have a significant advantage. What’s more, these potential advantages aren’t necessarily just minor cost savings; OSS can in some cases provide a military advantage. Which is a more-than-adequate justification for considering OSS, as I have been advocating for years.

I’m really delighted that this memo has finally been released. I participated in the original brainstorming meeting to create this memo (as did John Scott), and I reviewed many versions of it, but many, many other hands have stirred this pot since it began. It took over 18 months to create it and get it out; getting this coordinated was a very long and drawn-out process. My thanks to everyone who worked to help make this happen. In particular, congrats go to Dan Risacher, who led this project to its successful completion.

By the way, if you’re interested in the issue of open source software in the U.S. military/national defense, you probably should look at Mil-OSS (at least, join their mailing list, and consider going to their upcoming conference; I was a speaker at their last one). If you’re interested in the connection between open source software and the U.S. government (including the military), you might also be interested in the upcoming GOSCON conference on November 5, 2009 (I’m one of the speakers there too).

path: /oss | Current Weblog | permanent link to this entry

Sat, 17 Oct 2009

CVC3 License Changed to BSD

CVC3 is one of the better automated theorem provers. Given certain mathematical assertions, it can in many cases prove that certain claims follow from them. Some tools that can prove properties about programs use CVC3 (and/or similar programs). For example, the Frama-C Jessie plug-in for C and Krakatoa for Java use Why, which can build on one of several programs including CVC3.

Problem is, CVC’s license has historically been a problem. I understand that its authors intended for CVC3 to be Free/Libre/Open Source Software (FLOSS), but unfortunately, it was released with additional license clauses that resulted in yet another non-standard license. This was an unfortunate mistake; as I note in my essay on GPL-compatible licenses, it is absolutely critical to choose a standard FLOSS license when releasing FLOSS. In this case, the big problem was the addition of an “indemnification” clause that was really scary; to some, at least, it seemed to imply that if the CVC3 authors were sued, anyone who used or copied the program was obligated to pay their legal bills. Interpreted that way, no one wanted to touch the program… how could any user possibly know their risks? Fedora eventually ruled that this license was non-free (aka not FLOSS), and thus could not be included in Fedora. There was a less-serious problem that if you made a change to the program, you had to change the name… since the program couldn’t even compile without a change (at the time), this meant that you had to change the name almost instantly. There is a reason that people have converged on standard FLOSS licenses; if your lawyer says you need to add non-standard clauses, be wary, because the result may be that few people can use your program.

I’m delighted to report that this has a happy ending. CVC3’s license has just been changed to a straight BSD license - a well-known license that is universally acknowledged as being FLOSS. This means that there are no licensing problems for Linux distributions. Only about a day after he found this out, Jerry James has submitted a CVC3 package to Fedora. So, I expect that in a relatively short time we’ll see CVC3 available directly in common Linux distribution repositories.

I think this is a helpful step towards open proofs, which are cases where an implementation, its proofs, and the necessary tools are all FLOSS. Having a good tool like CVC3 to build on makes it easier to develop useful tools. My hope is to mature formal methods tools so that they can be more scaleable, applicable, and effective than they are today. It’s clear that a single little tool cannot possibly do the job; we need suites of tools that can work together. And this is a promising step in that direction.

path: /oss | Current Weblog | permanent link to this entry

Wed, 12 Aug 2009

Auto-DESTDIR released!

I’ve just released Auto-DESTDIR, a software package which helps automate program installation on POSIX/Unix/Linux systems from source code. If you have the problem it solves — automatic support for DESTDIR — you want this!

A little background: Many programs for Unix/Linux are provided as source code. Such programs must be configured, built, and installed, and that last step is normally performed by typing “make install”. The “make install” step normally writes directly to privileged directories like “/usr/bin” to perform the installation. Unfortunately, most modern packaging systems (such as those for .rpm and .deb files) require that files be written to some intermediate directory instead, even though when run they will be in a different filesystem location (because of security issues). This redirection is easy to do if the installation script supports the “DESTDIR convention”; simply set DESTDIR to the intermediate directory’s value and run “make install”. Supporting DESTDIR is a best practice when releasing software. Unfortunately, many source packages don’t support the DESTDIR convention. Auto-DESTDIR causes “make install” to support DESTDIR, even if the provided “makefile” doesn’t support the DESTDIR convention. Auto-DESTDIR is released under the “MIT” license, so it is Free-libre/open source software (FLOSS).

Auto-DESTDIR is implemented using a set of bash shell scripts that wrap typical install commands (such as install, cp, ln, and mkdir), These wrappers are placed in a special directory. The run-redir command modifies the PATH so that the directory with these scripts is listed first, and then runs the given command. The make-redir command invokes “make” using run-redir, along with some extra settings to simplify things. For more information on this approach, and why this is a good way to automate DESTDIR, see the paper Automating DESTDIR, especially its section on wrappers.

So please take a look at the Auto-DESTDIR software package, if you have the problem it solves.

path: /oss | Current Weblog | permanent link to this entry

Sun, 26 Jul 2009

Limiting Unix/Linux/POSIX filenames simplifies things: Lowercasing filenames

My essay Fixing Unix/Linux/POSIX Filenames: Control Characters (such as Newline), Leading Dashes, and Other Problems argues that adding some limitations on legal Unix/Linux/POSIX filenames would be an improvement. In particular, a few minor limitations (which most people assume anyway) would eliminate certain kinds of bugs, some of which end up being security vulnerabilities. Forbidding crazy things (like control characters in filenames) simplifies creating programs that work all the time.

Here’s a little example of this. I wanted to convert all the filenames inside a directory tree to all lowercase letters. I didn’t want to lose any files without checking on them first, so I wanted it to ask before doing a rename in a way that would eliminate a file (i.e., I wanted to use mv -i). I didn’t find such a program built into my distro, so I wrote a short script to do it (which is just as well, because it makes a nice simple example). I wanted it to be portable, since I might need it again later.

So how do we write this? A simple glob like “*” won’t work, because it needs to recursively descend through a tree of directories, and simple globs will skip hidden filesystem objects too (and I want to include them). I could write a more complex glob that included hidden files and directories, and recursed down through subdirectories, but the naive way of recursing down subdirectories can have many problems (e.g., it could get stuck in endless loops created by symbolic links). If we need to handle a tree recursively, there’s a better tool designed for the purpose — find.

Unfortunately, an ordinary find . has an interesting problem — it will pick the upper-level names first, and if we rename the upper-level names first, find will fail when it tries to enter them (since they will no longer exist). No problem — if we are manipulating the tree structure (including renames), we can use the -depth option of find, which will process each directory’s contents before the directory itself. We can then rename just the basename of what find returns, so we won’t change anything before find descends into it.

Now, if we could assume that newlines and tabs cannot be in filenames, as recommended in Fixing Unix/Linux/POSIX Filenames…, then we can do a simple for loop around the results of find. My shell script mklowercase renames filenames to lowercase letters recursively. Here is its essence:

  #!/bin/sh
  # mklowercase - change all filenames to lowercase recursively from "." down.
  # Will prompt if there's an existing file of that name (mv -i)
  # Presumes that filenames don't include newline or tab.

  set -eu
  IFS=`printf '\n\t'`
  
  for file in `find . -depth` ; do
    [ "." = "$file" ] && continue                  # Skip "." entry.
    dir=`dirname "$file"`
    base=`basename "$file"`
    oldname="$dir/$base"
    newbase=`printf "%s" "$base" | tr A-Z a-z`
    newname="$dir/$newbase"
    if [ "$oldname" != "$newname" ] ; then
      mv -i "$file" "$newname"
    fi
  done

This script skips “.”, which is not strictly necessary, but I thought it would be a good idea to point out that you may need to skip “.” sometimes.

Yes, this could be modified to handle literally all possible Unix/Linux/POSIX filenames, but those modifications make it more complicated and uglier. One approach would be to use one program to use find…-exec, which then invokes another script to do the renaming. But then you have to maintain two scripts, and keep them in sync. You could embed the command into find, but then the find command becomes hideously complicated.

Another solution to handling all filenames would be to change the loop to:

  find . -depth -print0 |
  while IFS="" read -r -d '' file ; do ...

However, this requires non-standard GNU extensions to find (-print0) and bash (read -d), as well as being uglier and more complicated. Also, if “mv” is implemented as required by the Single Unix Standard, then the “mv -i” will fail badly if it tries to rename a file into an existing name. That’s because when it tries to get an answer, it will send a prompt to stderr, but it will expect a RESPONSE from stdin… and yet, stdin is where it gets the list of filenames!!

And it’s all silly anyway. If you put newlines in filenames, lots of scripts fail. It’s simply too much of a pain to deal with them “correctly”. Which is the point of Fixing Unix/Linux/POSIX Filenames — adding some limitations on legal Unix/Linux/POSIX filenames would be an improvement. At the least, by default let’s forbid control characters (so simple “find” and filename display is safe), forbid leading dash characters (so simple globbing is safe), require that all filenames be UTF-8 (so displaying filenames always works), and perhaps forbid trailing spaces (since these are dangerously misleading to end-users). I would like to see kernels build in the mechanisms to forbid certain kinds of filenames, so that administrators can then specify the specific “bad filename” policy they would like to use.

So please take a look at: Fixing Unix/Linux/POSIX Filenames: Control Characters (such as Newline), Leading Dashes, and Other Problems. I’ve made a few recent additions, thanks to some interesting comments people have sent, but the basic message is the same.

path: /security | Current Weblog | permanent link to this entry

Fri, 05 Jun 2009

SPARK released as FLOSS (Free/ Libre / Open Source Software)!

The SPARK toolsuite has just been released as FLOSS (Free/ Libre / Open Source Software) by Praxis (its creator). This is great news for those who want to make software safer, more reliable, and more secure. In particular, this means that Tokeneer is now an open proof. If you haven’t been following this, here’s some background.

Software is now a part of really critical systems (ones that need “high assurance”), yet often that software is not as safe, reliable, or secure as it needs to be. I believe that in the long term, we will need to start proving that our very important programs are correct. Testing by itself isn’t enough; completely testing the trivial “add three 64-bit integers” program would take far longer than the age of the universe (it would take about 2x10^39 years). The basic idea of using mathematics to prove that programs are correct — aka “formal methods” — has been around for decades. There are a number of cases where formal methods have been applied successfully, and I’m glad about that. And yet, applying formal methods is still relatively rare. There are many reasons for this, such as inadequate maturation and capabilities of many formal methods tools, and the fact that relatively few people know how to apply formal methods when developing real programs. But what, in turn, is causing those problems? It’s true that applying formal methods is a hard problem that hasn’t received the level of funding it needs, but still, it’s been decades!

I believe one problem hindering the maturation and spread of formal methods is a “culture of secrecy”. Details of formal method use are often unpublished (e.g., because the implementations are proprietary or classified). Similarly, details about formal methods tools are often unshared and lost (or have to constantly re-invented). Biere’s “The Evolution from LIMMAT to NANOSAT” (Apr 2004) gives an example: “From the publications alone, without access to the source code, various details were still unclear… Only [when CHAFF’s source code became available did] our unfortunate design decision became clear… The lesson learned is, that important details are often omitted in publications and can only be extracted from source code. It can be argued, that making source code of SAT solvers available is as important to the advancement of the field as publications”

This “culture of secrecy” means that researchers/toolmakers often don’t receive adequate feedback, researchers/toolmakers waste time and money rebuilding tools, educators have difficulty explaining formal methods (they have no examples to show!), developers don’t understand how to apply it (and it has an uncertain value to them), and evaluators/end-users don’t know what to look for.

I believe that a way to break through this “culture of secrecy” is to develop “open proofs”. But what are they? An “open proof” is software or a system where all of the following are free-libre / open source software (FLOSS):

Something is FLOSS if it gives anyone the freedom to use, study, modify, and redistribute modified and unmodified versions of it, meeting the Free software definition and the open source definition.

Imagine if we had a number of open proofs available. There could be small open proofs that could be used for learning (e.g., as examples and use in class exercises). There could be proofs of various useful functions and small applications, so developers could see how to scale up these techniques, directly reuse them as components, or use them as starting points but add additional (proven) capabilities to them. When problems come up (and they will!), toolmakers and developers could work together to find ways to mature the tools and technology so that they’d be easier to use (e.g., so more could be automated). In short, imagine there was a working ecosystem where researchers/toolmakers/educators, developers of implementations to be proved, and evaluators/end-users could work together by sharing information. I believe that would greatly speed up the maturing of formal methods, resulting in more reliable and secure software.

In this context, Praxis has just released the SPARK GPL Edition. This is their SPARK toolsuite (a formal methods tool) released under the GNU General Public License aka GPL (the most common FLOSS license). So, what’s that?

SPARK is a variant of the Ada programming language, designed to enable proofs about programs (by adding and removing some features of Ada). The additions are in special comments, so SPARK programs can be compiled by a normal Ada compiler like GNAT (which is part of gcc). The Open Proofs page on SPARK has some information on SPARK. The page What is Special About SPARK Contracts? gives a nice quick introduction to SPARK, which I will quote here. It points out that the Ada line:

        procedure Inc (X : in out Integer);
just says there is some procedure “Inc” that may read a value X, and may write it out, but that’s it. In SPARK, you can add much more precise information, and the SPARK tools can then check to see if they are true. For example, if you say this using SPARK:
        procedure Inc (X : in out Integer);
        --# global in out CallCount;
        --# pre  X < Integer'Last and
        --#      CallCount < Integer'Last;
        --# post X = X~ + 1 and 
        --#      CallCount = CallCount~ + 1;
then the SPARK tools will ensure at compile-time (not run-time) that:

You can learn more about SPARK from the book High Integrity Software: The SPARK Approach to Safety and Security” by John Barnes. Sample text of Barnes’ book is available online. The open proofs page on SPARK has more information.

This means that the “Tokeneer” program is now an open proof. Remember, to be an open proof, a program’s implementation, proofs, and required tools have to be open source software. Tokeneer was a sample program written to show how to apply these kinds of techniques to actual systems (instead of trivial 5-line programs). The Tokeneer program itself, and its proofs, have already been released as open source software. Many of the tools it required are already FLOSS (e.g., fuzz and LaTeX for its formal specifications, and an Ada compiler to compile it). Now that SPARK has been released as FLOSS, people can examine this entire stack of software to make improvements in all the technologies, as well as learn from them and create improved implementations. No, this doesn’t suddenly make it trivial to make proofs about complex programs, but it’s a step forward.

If you are interested in making future software better, please help the open proofs project. You don’t need to be a math whiz. For example, if you know how to do shell scripting, please help us package some promising formal methods tools (like SPARK) so they are easy to install. It’s hard to get people to try out these tools (and give feedback) if they’re too hard to install. If you know of formal methods software that is rotting in some warehouse, try to get it released as FLOSS. I think all government-funded unclassified research software should be released as FLOSS by default, since “we the people” paid for it! If you’re interested in the latest software technology, try out a few of these formal methods tools, and release as FLOSS any small programs and proofs you develop with them. Send the toolmakers feedback, or write down their strengths and weaknesses to help others understand them. SPARK is a tool that can be used, right now, in certain circumstances. I have no illusions that today’s formal methods tools are ready for arbitrary 20 million line programs. But if we want future software to be better than today, we need to figure out how to mature formal methods technology and make it better-understood so that it can mature and scale. I think making top-to-bottom worked examples and starting points can help us get there.

path: /oss | Current Weblog | permanent link to this entry

Thu, 28 May 2009

Parchment: Running the Z-machine

I just learned of fun web application called Parchment. Parchment lets you play interactive fiction (I.F., aka “text adventure games”) using just your web browser. It only works with I.F. in “Z-machine” format, but that’s a very common format.

So go to the parchment site and try out something from their long list of interactive fiction… now you don’t need to install anything! That includes my small replayable puzzle “Accuse” (my Accuse source code is already available).

If you want more information about it, here’s a brief post about Parchment by its author, Atul Varma. Atul built this based on an existing program, Thomas Thurman’s Gnusto. Both are open source software (using the GPLv2 license). Once again, this demonstrates the neat thing about community-developed software; one person developed a program for one circumstance, and another extended it for a different circumstance.

There are several tools available for creating interactive fiction. I’ve been watching Inform 7 for a while, with interest, because it takes a radically different approach to writing code. Inform 7 is a natural-language programming language that tries to actively exploit features of natural language to make developing these kinds of things easier. You can see a brief Inform 7 tutorial if you’re curious, as well as the full Writing with Inform documentation. Inform 7 isn’t itself OSS, though significant portions are; inform 6 (a key substrate) and many other portions including the Inform 7 standard rules are released under the Artistic License 2.0. The extensions are released under the “Attribution Creative Commons licence”; that’s not normally a license used for software, but I think it’d meet the criteria for OSS, and Fedora approves of this license for content. I hope that someday the rest will be released as OSS as well. The logic behind Inform 7 is described in “Natural Language, Semantic Analysis and Interactive Fiction” by Graham Nelson. If you’re interested in some of the technical stuff behind it, the text of the Standard Rules, the text of the extensions, Inform 7 for programmers, and the Chart of Rules can tell you more.

path: /oss | Current Weblog | permanent link to this entry

Sat, 23 May 2009

Wikipedia changes its license

The Wikimedia Foundation (WMF) will change the licensing terms on all its materials — including Wikipedia. Now, all of its existing material will be released under the Creative Commons Attribution-ShareAlike (CC-BY-SA) license in addition to the current GNU Free Documentation License (GFDL). The WMF says “This change is meant to advance the WMF’s mission by increasing the compatibility and availability of free content.” This means that Wikipedia material can now be combined with the vast amount of CC-BY-SA licensed material, and Wikipedia can now include the volumes of CC-BY-SA material (that material will just be CC-BY-SA). It also makes it easier to use Wikipedia material (and other material from the Wikimedia Foundation).

I think this is a good thing overall. Incompatible licenses are a real scourge on community-developed works. Past experience shows that license incompatibility can be a real problem for free-libre/ open source software (FLOSS or OSS), in particular. Bruce Perens warned about FLOSS license incompatibility back in 1999! As I argue in Make Your Open Source Software GPL-Compatible. Or Else, you should release free-libre/ open source software (FLOSS) using a GPL-compatible license. You don’t need to use the GPL, but using a GPL-compatible license (like the MIT, BSD-new, LGPL, or GPL) so means that people can combine your software with other software to create larger works. I show how this works in The Free-Libre / Open Source Software (FLOSS) License Slide, which has a simple graph showing how common FLOSS licenses can work together. Wikipedia articles aren’t software, but the principles still apply - licenses need to enable community-developed works, not disable them.

Now, nothing is perfect. One nice benefit of the GNU Free Documentation License (GFDL) is that it requires that readers be able to get editable versions whose format specification is available to the public (for details, see its text on “transparent” copies). This is a really nice feature of the GFDL; it counters some of the problems of proprietary formats.

The GFDL has many problems, though, when used for short works like Wikipedia articles or images. Most obviously, it requires that you include the entire text of the license with each work (see GFDL 1.3 section 2). That’s no problem for large manuals, which is what the GFDL was designed for, but it’s a big problem for short works. Nobody likes having a license longer than the article it’s attached to! This is one reason why CC-BY-SA is so widely used for short works - and since Wikipedia is primarily a large set of short works, it makes sense. Which is why I (and many others) voted to approve this change.

Now it’s certainly true that people also complain that the GFDL allows the addition of unmodifiable sections. But many GFDL items don’t have them, and Debian determined through a formal vote that “GFDL-licensed works without unmodifiable sections are free [as in freedom]”.

I should also give credit to the Wikimedia Foundation (WMF), Richard Stallman of the FSF, and Lawrence Lessig, who worked together to make this possible.

For more on the Wikimedia license modification, you can see Wikimedia license FAQ, Lawrence Lessig’s post on GFDL 1.3, GFDL 1.3: Wikipedia’s exit permit, FDL 1.3 FAQ, and An open response to Chris Frey regarding GFDL 1.3.

path: /oss | Current Weblog | permanent link to this entry

Fri, 22 May 2009

Government-developed Unclassified Software: Default release as Open Source Software

I’d like to see this idea seriously considered and discussed: By default, unclassified software which the government paid to develop should be released to the public as open source software (unless there’s a good reason not to).

Why? Well, If “we the people” paid to develop it, then “we the people” should get it! I think this idea fits into the good government ideal of data transparency; after all, software is data. Currently, we have a lot of waste and unnecessary costs due to loss, re-development, and/or government-created monopolies. The government is not a venture capitalist (VC); people who need a VC should go to a VC.

Let me focus specifically on the United States. I think this idea easily fits into the broader ideas of transparency and open government, including the Memorandum on Transparency and Open Government. Look at all the excitement over data.gov, indeed, Apps for America having a contest to develop software to use data from data.gov.

Indeed, there’s a long history of U.S. laws specifically set up to make data available. Most obviously, Freedom on information act (FOIA) requests make it possible to extract information from the U.S. government. 17 USC 105 and 17 USC 101 prevents the U.S. government from claiming U.S. copyright on a work “prepared by an officer or employee of the United States Government as part of that person’s official duties”. So this idea would be an extension of what’s already gone on.

Let me focus on research, and how this idea could help advance technology. Think of all the advantages if software developed by U.S.-funded research could be reused by other research projects and commercial firms. For example, imagine if other researchers could simply extend previous work by modifying previously-developed software, instead of re-building yet another version from scratch. Anyone could take commercialize the research making it more likely that it would be commercialized instead of being lost in the archives shown at the end of Raiders of the Lost Ark. Some argue that giving sole rights is the only way to commercialization, but that’s just not true; open source software is commercial software, so this is simply a different and fairer path to commercialization. In contrast, the current system inhibits all kinds of technical progress; Biere’s “The Evolution from LIMMAT to NANOSAT” (Apr 2004) found that “important details are often omitted in [research] publications and can only be extracted from source code… [Making source code available] is as important to the advancement of the field as publications”. Originally I thought of this idea for research software, and it’s not hard to see why. But when I starting thinking about the reasons for doing this — especially “if ‘we the people’ paid to develop it, then ‘we the people’ should get it” — then I realized that this principle applies much more broadly.

An open government directive isn’t out yet, but they’re clearly working on it. Please submit this - and other ideas like it - to them. I think there’s a lot of promise, but they can only enact and refine ideas that they’ve heard of. If you like this idea, please vote for it.

If this happened, I envision a two-stage process: (1) release of the software as an archive (so it can be downloaded), and (2) some of it will get picked up and used to start an active OSS project. The second stage might not happen for many years after the first, and that’s okay. Some will ask “how will people find it”, but I think that’s the wrong question. There are many commercial search engines that can find code, but they can only find stuff that’s web-accessible; let’s give them something to find.

Perhaps this should be done in stages. For example, perhaps it’d be best to start with software developed by research. Researchers are supposed to share their results anyway (under most cases), and the lack of software release often inhibits research (e.g., it’s harder to check or repeat results). You could then broaden this to other types of software.

I’m sure there will need to be exceptions. There would need to be some sort of guidelines to figure out when to grant those exceptions, and those guidelines should be developed though lively discussion. Most obviously, if it’s a special ingredient necessary for national security, then it should be classified and not revealed in any form. I would not expect weapon systems or intelligence software to be released (though sometimes generic functions developed in them could be released). Export controls would still apply. But the exceptions should be that: Exceptions.

path: /oss | Current Weblog | permanent link to this entry

Mon, 11 May 2009

Wikipedia for childrens’ schools

Wikipedia is a cool project. But if you want to hand an encyclopedia to younger children or to schools, Wikipedia is not a great choice. Wikipedia is not “child-safe”, nor is intended to be; it includes a lot of “adult” content. Also, Wikipedia constantly suffers vandalism; the vandalism is often repaired quickly, but that’s little comfort to parents and teachers. There’s also the problem of Internet access; schools typically employ blocking software, and blocking software is fundamentally not smart. Since Wikipedia mixes material that’s okay for children with stuff that is not, Wikipedia often gets blocked by schools for children. Some schools for children just don’t have Internet access at all, for a variety of reasons. All of this makes it hard for such schools to directly use Wikipedia.

Wikipedia for schools is a cool project that compensates for this. It’s a free, hand-checked, non-commercial selection from Wikipedia, targeted around the UK National Curriculum and useful for much of the English speaking world. The current version has about 5500 articles (as much as can be fit on a DVD with good size images) and is “about the size of a twenty volume encyclopaedia (34,000 images and 20 million words)”. It was developed by carefuly selecting for content, then checking for vandalism and suitability by “SOS Children volunteers”. You can download it for free from the website, or as a free 3.5GB DVD.

I also see this as a future model for Wikipedia — allow people to edit, but have a separate vetting process that identifies particular versions of an article as vetted. Then, people can choose if they want to see the latest version or the most recent vetted version. To some, this is very controversial, but I don’t see it that way. A vetting process doesn’t prevent future edits, and it creates a way for people to get what they want… material that they can have increased confidence in. The trick is to develop a good-enough vetting process (or perhaps multiple vetting/rating processes for different purposes). This didn’t make sense back when Wikipedia was first starting (the problem was to get articles written at all!), but now that Wikipedia is more mature, it shouldn’t be surprising that there’s a new need to identify vetted articles. Yes, you have to worry about countries to whom “democracy” is a dirty word, but I think such problems can be resolved. This is hardly a new idea; see Wikimedia’s article on article validation, Wikipedia’s pushing to 1.0, WikiQA by Eloquence, and FlaggedRevs. I am sure that a vetting/validation process will take time to develop, and it will be imperfect… but that doesn’t make it a bad idea.

So anyway, if you know or have younger kids, check out Wikipedia for schools. This is a project that more people should know about.

path: /oss | Current Weblog | permanent link to this entry

Thu, 07 May 2009

FLOSS doubles every 14 months!

I just took a look at Red Hat’s 2009 brief to the European Patent Office on why software patents should not be allowed. It’s a nice brief, noting that software patents hinder software innovation, and that there is a sound legal basis not to expand availability of such patents in Europe. (Here’s Red Hat’s press release, and Glyn Moody’s comments (ComputerWorld UK) on it).

Their brief points to another paper with very interesting results: “The Total Growth of Open Source” by Amit Deshpande and Dirk Riehle (Proceedings of the Fourth Conference on Open Source Systems (OSS 2008). Springer Verlag, 2008. Page 197-209). In this paper, they analyze the growth of more than 5000 open source software projects, and show that “the total amount of source code as well as the total number of open source projects is growing at an exponential rate.” In their conclusion they state that the “total amount of source code and the total number of projects double about every 14 months.”

That is an extraordinary rate of growth. Exponential growth can start small, but when it continues it will completely flatten anything not growing exponentially (or growing as fast). This result is consistent with my earlier work, More than a Gigabuck: Estimating GNU/Linux’s Size, which also found very rapid growth in free/libre/open source software (FLOSS).

So if you’re interested in software trends, take a look at “The Total Growth of Open Source” and Red Hat’s brief to the EPO on software patents. I think they’re both worth reading.

path: /oss | Current Weblog | permanent link to this entry

Sat, 02 May 2009

Own your own site!

Geocities, a web hosting site sponsored by Yahoo, is shutting down. Which means that, barring lots of work by others, all of its information will be disappearing forever. Jason Scott is trying to coordinate efforts to archive GeoCities’ information, but it’s not easy. He estimates they’re archiving about 2 Gigabytes/hour, pulling in about 5 Geocities sites per second… and they don’t know if it’ll be enough. What’s more, the group has yet to figure out how to serve it: “It is more important to me to grab the data than to figure out how to serve it later…. I don’t see how the final collection won’t end up online, but how is elusive…”

This sort of thing happens all the time, sadly. Some company provides a free service for your site / blog / whatever… and so you take advantage of it. That’s fine, but if you care about your site, make sure you own your data sufficiently so that you can move somewhere else… because you may have to. Yahoo is a big, well-known company, who paid $3.5 billion for Geocities… and now it’s going away.

Please own your own site — both its domain name and its content — if it’s important to you. I’ve seen way too many people have trouble with their sites because they didn’t really own them. Too many scams are based on folks who “register” your domain for you, but actually register it in their own names… and then hold your site as a hostage. Similarly, many organizations provide wonderful software that is unique to their site for managing your data… but then you either can’t get your own data, or you can’t use your data because you can’t separately get and re-install the software to use it. Using open standards and/or open source software can help reduce vendor lock-in — that way, if the software vendor/website disappears or stops supporting the product/service, you can still use the software or a replacement for it. And of course, continuously back up your data offsite, so if the hosting service disappears without notice, you still have your data and you can get back on.

I practice what I preach. My personal site, www.dwheeler.com, has moved several times, without problems. I needed to switch my web hosting service (again) earlier in 2009, and it was essentially no problem. I just used “rsync” to copy the files to my new hosting service, change the domain information so people would use the new hosting service instead, and I was up and running. I’ve switched web servers several times, but since I emphasize using ordinary standards like HTTP, HTML, and so on, I haven’t had any trouble. The key is to (1) own the domain name, and (2) make sure that you have your data (via backups) in a format that lets you switch to another provider or vendor. Do that, and you’ll save yourself a lot of agony later.

path: /misc | Current Weblog | permanent link to this entry

Wed, 22 Apr 2009

Why copyright damage limits don’t hurt FLOSS

There’s a move afoot to argue that copyright infringement penalties should bear a rational relationship to the value of what was infringed. You might think that this could harm Free/Libre/Open Source Software (FLOSS), but I don’t think so. Here’s why.

First: This is all being brought to a head by the current file-sharing lawsuit against Boston University graduate student Joel Tenenbaum, which raises a number of interesting questions. One issue that I find particularly interesting is the issue of statutory damages: Are fines from $750 to $150,000 per song (worth at most $1), non-commercially shared without permission, even legal under the US Constitution? Or, are these fines so excessive that they are unconstitutional? Ars Technical gives a brief summary of the case, if you haven’t been following it. The Free Software Foundation (FSF)’s Amicus Brief in Connection with defendant’s motion to dismiss on grounds of unconstitutionality of copyright act statutory damages as applied to infringement of single MP3 files argues that these penalties grossly exceed the crime; the FSF argues that the “State Farm/Gore due process test applicable to punitive damage awards is likewise applicable to statutory damages, and in particular bars the suggestion that each infringement of an MP3 file having a retail value of 99 cents or less may be punishable by statutory damages of from $750 to $150,000 — or from 2,100 to 425,000 times the actual damages”.

Frankly, I think the FSF and Tenenbaum have a reasonable argument on this point. People who shoplift a CD from a store would definitely pay penalties when caught, but those penalties would bear some relationship to the value of the property stolen, and would be far smaller than a file-sharer. This notion that the “punishment should fit the crime” is certainly not new; Proverbs 6:30-31 talks about thieves paying sevenfold if they are caught. That doesn’t make such actions right - but unjust penalties aren’t right either. I think a lot of the problem is that copyright laws were originally written when only rich people with printing presses could really make and distribute many copies of material. Today, 8-year-olds can distribute as much information as the New York Times, and the law hasn’t caught up.

But does the FSF risk subverting Free/Libre/Open Source Software (FLOSS) by making this argument? After all, FLOSS developers also depend on copyright law to enforce certain conditions, and often charge $0 for copies of their software. If the penalties would be limited to “7 times the original cost”, would that make FLOSS development impossible?

I don’t think there’s any problem, but for some people that may not be obvious. The difference is that in a typical music copyright infringement case, the filesharer could purchase the right to do what they’re doing for a relatively low price, something typically not true for FLOSS. For example, under normal circumstances it’s perfectly legal to buy a song for $1, and then transfer that song to someone else (as long as you destroy your own copies), so sharing that song with 10 people is legal after paying $10.

In contrast, violations of FLOSS licenses often can’t be made legal by simply buying the rights. If you violate the revised BSD license by removing all credits to the original author, there’s typically no “alternative” legal version available for sale without the author credits. (Indeed, under legal systems with strict “moral rights” it may not even be possible.) Similarly, if you violate the GPL by releasing binary software yet refusing to release its source code, there’s often no way to pay additional money to the original authors for that privilege. In some cases, GPL’ed software is released via a dual-use license (e.g., “GPL or proprietary”), with the proprietary version costing additional money; in those cases you do have a value that you can compare against. In cases where there is a value you can compare against, then you should use that value to help determine the penalty. Otherwise, a much stiffer penalty is justified, because there is no method for the infringer to “buy” his or her way out, and their actions risk making functional products (not just entertainment) unsupportable. As noted in the United States Court of Appeals for the Federal Circuit case 2008-1001, JACOBSEN v. KATZER, the court essentially found that failing to obey the conditions of an open source software license led to copyright infringement. (For more on this particular case, see New Open Source Legal Decision: Jacobsen & Katzer and How Model Train Software Will Have an Important Effect on Open Source Licensing.)

So I think that it does make sense to limit copyright penalties based on the value of the original infringed item… but that doing so does not (necessarily) put FLOSS development processes at risk.

path: /oss | Current Weblog | permanent link to this entry

Mon, 20 Apr 2009

Microsoft loses to Open Source Approaches (Encarta vs. Wikipedia)

The competition is over. On one side, we have Microsoft, a company with a market value of about $166 billion (according to a 2009-04-20 NASDAQ quote). On the other side, we have some volunteers who work together and share their results on the web using open source approaches.

And Microsoft lost.

As pointed out by Chris Dawson (ZDNet), Mike Jennings (PC Pro), Naomi Alderman (the Guardian), Noam Cohen (NY Times), Adam Ostrow (Mashable), and many others, Microsoft Encarta (Microsoft’s encyclopedia project) has folded, having failed to compete with Wikipedia. It’s not even hard to see why:

  1. Wikipedia is cheaper than Encarta (no-cost vs. cost)
  2. Wikipedia is easier to start using. If you have a web browser, you have Wikipedia. In contrast, you have to specially install Encarta, and it does not work on all platforms.
  3. Wikipedia is more up-to-date than Encarta. It often took years before Encarta entries got updated, even on trivially obvious issues such as death dates.
  4. Wikipedia has far more material. Wikipedia has far more articles, and generally it has far more material in each article.
  5. Wikpedia’s material has fewer legal restrictions, so users are allowed to do more with Wikipedia results. Creating mash-ups and reposting portions is part of today’s world.

One lesson to be learned here is that it sometimes doesn’t matter how large a company is; changes in technology may mean that they may abandon something in the future. Plan that the future will change, even if a company seems invincible. It’s easy to pick on Microsoft here, but the same can be said of IBM, or Oracle, or anyone else. Tying yourself completely to any one company is, in the long term, a mistake. Thus, you need to have a reasonable escape plan if a company folds or stops supporting a product that you depend on.

Another lesson to be learned here is that proprietary approaches can be beaten by open source approaches. That doesn’t mean it must happen every time, of course. But clearly open source approaches can, at least sometimes, dominate their proprietary competition.

In the long term, it simply doesn’t matter if a company has more money if an open-sourced competitor can produce a better product, make it available at a lower cost, and can sustain that process indefinitely. Given those three factors, proprietary vendors will lose to an open-sourced competitor unless there’s a key differentiator that is sufficiently valuable to users. In such cases, having more money is just an opportunity to lose more money; it gives no benefit of scale. Microsoft’s Encarta team tried to compete by adding special materials (like fancy graphics and sound). I’m sure that Encarta managers convinced themselves that because they were spending money to develop these materials, that users would pay for Encarta instead. They were wrong. In the end, users were more interested in good, timely information than in fancy graphics, and Encarta simply didn’t have a chance. Open source approaches were simply better at providing the encyclopedia people wanted than proprietary approaches were.

The obvious question to me is, are there any lessons that apply to software too? Wikipedia uses free / libre / open source software (FLOSS) principles, but Wikipedia is an encyclopedia not a FLOSS program. Indeed, software is different than encyclopedias in many ways, for example, people can easily switch encyclopedias (while the lock-in and network effects of software are well-known), and far more people can participate in encyclopedia development than in software development. But I still think there are lessons to be learned here. This Encarta vs. Wikipedia battle should make it clear that no proprietary company — no matter how well-resourced it is — is invulnerable to open source competition. Developers of products with FLOSS-like licenses give up some privileges that the law permits them to have, and in return, they can often drastically reduce their development costs and increase the breadth of the result (because the development efforts can be shared among many developers). At a certain point, FLOSS-like projects can end up like a snowball rolling down the hill; they gain so much momentum that even large sums of money — or being the first — aren’t enough to counter them. As a result, even proprietary companies with massive cash resources do not always win. In summary, it doesn’t matter if you have lots of money; if your product costs more and does less (from the user’s point of view), you must change that circumstance, eliminate all competition, or suffer failure of the product.

path: /oss | Current Weblog | permanent link to this entry

Mon, 13 Apr 2009

Releasing FLOSS Software

If you’ve written (or started to write) some Free/Libre/Open Source Software (FLOSS), please follow the time-tested community standards for releasing FLOSS software when you want people to be able to install it from source code. Unfortunately, a lot of people don’t seem to be aware of what these conventions are. This really hit me in my recent OpenProofs work; we’re trying to make it easy to install programs by pre-packaging them, and we’ve found that some programs are a nightmare to package or install because their developers did not follow the standard conventions.

So I’ve released a brief article: Releasing Free/Libre/Open Source Software (FLOSS) for Source Installation, to help people learn about them. For the details, I point to the GNU Coding Standards (especially the release process chapter) and the Software Release Practice HOWTO. I also point out some of the most important conventions that will make building and installing your software much easier for your users:

  1. Pick a good, simple, Google-able name.
  2. Identify the version (using simple version numbers or ISO dates), and include that in the release filename as NAME-VERSION.FORMAT.
  3. Use a standard, widely-used, GPL-compatible FLOSS license — and say so.
  4. Follow good distribution-making practice, in particular, make sure tarballs always unpack into a single new directory named NAME-VERSION.
  5. Use the standard invocation to configure, build, and install it: ./configure; make; make install.
  6. Support the standard ./configure options like —prefix, —exec-prefix, —bindir, —libdir, and so on.
  7. Create a makefile that can rebuild everything and uses makefile variables (including applicable standard makefile variable names and targets).
  8. Have “make install” support DESTDIR.
  9. Document the external tools/libraries needed for building and running, and make it easy to separate/reuse them.
  10. If you patch an external library/tool, get the patch upstream.
  11. Use standard user interfaces. For command line tools, use “-” single-letter options, “—” long-name options, and “—” by itself to signal “no more options”. For GUI tools, provide a .desktop file.

To learn more, see the whole article: Releasing Free/Libre/Open Source Software (FLOSS) for Source Installation.

path: /oss | Current Weblog | permanent link to this entry

Tue, 24 Mar 2009

Fixing Unix/Linux/POSIX Filenames

Traditionally, Unix/Linux/POSIX filenames can be almost any sequence of bytes, and their meaning is unassigned. The only real rules are that “/” is always the directory separator, and that filenames can’t contain byte 0 (because this is the terminator). Although this is flexible, this creates many unnecessary problems. In particular, this lack of limitations makes it unnecessarily difficult to write correct programs (enabling many security flaws), makes it impossible to consistently and accurately display filenames, and it confuses users.

So for those of you who understand Unix/Linux/POSIX, I’ve just released a new technical article, Fixing Unix/Linux/POSIX Filenames.

This article will try to convince you that adding some limitations on legal Unix/Linux/POSIX filenames would be an improvement. Many programs already presume these limitations, the POSIX standard already permits such limitations, and many Unix/Linux filesystems already embed such limitations - so it’d be better to make these (reasonable) assumptions true in the first place. The article discusses, in particular, the problems of control characters in filenames, leading dashes in filenames, the lack of a standard encoding scheme (vs. UTF-8), and special metacharacters in filenames. Spaces in filenames are probably hopeless in general, but resolving some of the other issues will simplify their handling too. This article will then briefly discuss some methods for solving this long-term, though that’s not easy - if I’ve convinced you that this needs improving, I’d like your help figuring out how to do it!

So - take a peek at Fixing Unix/Linux/POSIX Filenames. If you have ideas on how to help, I’d love to know.

path: /oss | Current Weblog | permanent link to this entry

Thu, 26 Feb 2009

2009 UK Action Plan for Open Source Software

A new report from the UK titled Open Source, Open Standards and Re–Use: Government Action Plan is in the news; it’s been reported by the BBC, Times Online, and Ars Technica (among many others).

Here’s the first paragraph of its foreword: “Open Source has been one of the most significant cultural developments in IT and beyond over the last two decades: it has shown that individuals, working together over the Internet, can create products that rival and sometimes beat those of giant corporations; it has shown how giant corporations themselves, and Governments, can become more innovative, more agile and more cost-effective by building on the fruits of community work; and from its IT base the Open Source movement has given leadership to new thinking about intellectual property rights and the availability of information for re–use by others.”

In the policy section, it says that (note the last point):

Remarkable stuff.

path: /oss | Current Weblog | permanent link to this entry

Wed, 11 Feb 2009

Open Proofs: New site and why we need them

There’s a new website in town: http://www.openproofs.org. This site exists to define the term “open proofs” and encourage their development. What are open proofs, you ask? Well, let’s back up a little…

The world needs secure, accurate, and reliable software - but most software isn’t. Testing can find some problems, but testing by itself is inadequate. In fact, it’s completely impractical to fully test real programs. For example, completely testing a trivial program that only add three 64-bit numbers, using a trillion superfast computers, would take about 49,700,000,000,000,000,000,000,000,000 years! Real programs, of course, are far more complex.

There is actually an old, well-known approach that can give much more confidence that some software will do what it’s supposed to do. These are often called “formal methods”, which apply mathematical proof techniques to software. These approaches can produce verified software, where you can prove (given certain assumptions) that the software will (or won’t) do something. There’s been progress made over the last several decades, but they’re not widely used, even where it might make sense to use them. If there’s a need, and a technology, why hasn’t it matured faster and become more common?

There are many reasons, but I believe that one key problem is that there are relatively few fully-public examples of verified software. Instead, verified software is often highly classified, sensitive, and/or proprietary. Many of the other reasons are actually perpetuated by this. Existing formal methods tools need more maturing, true, but it’s rediculously hard for tool developers to mature the tools when few people can show or share meaningful examples. Similarly, software developers who have never used them do not believe such approaches can be used in “real software development” (since there are few examples) and/or can’t figure out how to apply them. In addition, they don’t have existing verified software that they can build on or modify to fit their needs. Teachers have difficulty explaining them, and students have difficulty learning from them. All of this ends up being self-perpetuating.

I believe one way to help the logjam is to encourage the development of “open proofs”. An “open proof” is software or a system where all of the following are free-libre / open source software (FLOSS):

Something is FLOSS if it gives anyone the freedom to use, study, modify, and redistribute modified and unmodified versions of it, meeting the Free software definition and the open source definition.

Open proofs do not solve every possible problem, of course. I don’t expect formal methods techologies to become instantly trivial to use just because a few open proofs show up. And formal methods are always subject to limitations, e.g.: (1) the formal specification might be wrong or incomplete for its purpose; (2) the tools might be incorrect; (3) one or more assumptions might be wrong. But they would still be a big improvement from where we are today. Many formal method approaches have historically not scaled up to larger programs, but open proofs may help counter that by enabling tool developers to work with others. In any case, I believe it’s worth trying.

So please take a look at: http://www.openproofs.org. For example, for open proofs to be easily created and maintained, we need for FLOSS formal methods tools to be packaged up for common systems so they can be easily installed and used; the web site has a page on the packaging status of various FLOSS tools. Please feel welcome to join us.

path: /oss | Current Weblog | permanent link to this entry

Thu, 22 Jan 2009

Automating DESTDIR for Packaging

Today’s users of Linux and Unix systems (including emulation systems like Cygwin) don’t want to manually install programs - they want to easily install pre-packaged software. But that means that someone has to create those packages.

When you’re creating packages, an annoying step is handling “make install” if the original software developer doesn’t support the DESTDIR convention. DESTDIR support is very important, because two of the most common packaging formats - Debian’s .deb (used by Debian and Ubuntu) and RPM (used by Fedora, Red Hat, and SuSE/Novell) - both require actions (redirection of writes) that DESTDIR enables. Unfortunately, many software developers don’t include support for DESTDIR, and it’s sometimes a pain to add DESTDIR support. Indeed, it’s often trivial to create packages except for having to make the modifications for DESTDIR support. Yes, adding DESTDIR support isn’t hard compared to many other tasks, but since it applies to every program, why not automate this instead?

So, I’ve written a little essay about Automating DESTDIR for Packaging. In it, I identify some of the ways I’ve identified for automating DESTDIR. If there are more - great! (Please let me know!). In any case, I’d love to see more automation, so that software will become easier to package and install.

Here’s the link, again: Automating DESTDIR for Packaging.

path: /oss | Current Weblog | permanent link to this entry

Mon, 12 Jan 2009

Apple Feedback URL

Oh, quick update - the URL for feedback to Apple is http://www.apple.com/feedback - I gave the wrong URL in my last post. My thanks to Steve Hoelzer, who was the first to send me a correction! Again - please ask them to support Ogg.

path: /oss | Current Weblog | permanent link to this entry

Sat, 10 Jan 2009

Ask Apple to Support Ogg on iPod/iTunes

Please ask Apple to support Ogg on their iPods, iPhones, and iTunes! It wouldn’t hurt to also sign this petition (and maybe this one), though I don’t know how strongly they’d influence Apple. Here’s why, as good news and bad news.

Bad news: Some of the most common formats for audio (like MP3 and AAC) are patent-encumbered, and thus not open standards. Because they’re patent-encumbered they are harder and more expensive to support. Many organizations like Wikipedia forbid the use of patent-encumbered standards, and they can’t be directly implemented in FLOSS products used in the U.S. and some other countries.

Good news: Ogg (as maintained by the Xiph.org foundation) is available! Ogg is a “container format” that can contain audio, video, and related material using one of several encodings. Usually audio is encoded with “Vorbis” (the combination is “Ogg Vorbis”); perfect sound reproductions can be created with FLAC. This format is already the required audio format for Wikipedia, and the next version of Mozilla’s Firefox will include Ogg built in. Many people already have huge music collections in Ogg format, and both many people report that Ogg is an important requirement for a player. See my older blog entry on playing Ogg Vorbis and Theora for more information.

Bad news: Apple’s iPods do not directly support Ogg. That’s really unfortunate for iPod users, and it also makes it harder to release files in Ogg. So please, ask Apple to add support for Ogg. People have been asking for this for some time, so it’s not true that “no one’s asking for it”. Some people have even taken radical efforts and rewritten the iPod software - but although that shows there’s a real interest, that’s an extreme measure that normal people shouldn’t have to do. There’s already software available to Apple to implement Ogg at no charge, and even the original iPods have enough horsepower to implement Ogg. Thus, it will cost Apple very little to add support for Ogg - and there are people who want it.

path: /oss | Current Weblog | permanent link to this entry

Thu, 08 Jan 2009

Updating cheap websites with rsync+ssh

I’ve figured out how to run and update cheap, simple websites using rsync and ssh and Linux. I thought I’d share that info here, in case you want to copy my approach.

My site (www.dwheeler.com) is an intentionally simple website. It’s simply a bunch of directories with static files; those files may contain Javascript and animated GIF, but site visitors aren’t supposed to cause them to change. Programs to manage my site (other than the web server) are run before the files are sent to the server. Most of today’s sites can’t be run this way… but when you can do this, the site is much easier to secure and manage. It’s also really efficient (and thus fast). Even if you can’t run a whole site this way, if you can run a big part of it this way, you can save yourself a lot of security, management, and performance problems.

This means that I can make arbitrary changes to a local copy of the website, and then use rsync+ssh to upload just those changes. rsync is a wonderful program, originally created by Andrew Tridgell, that can copy a directory tree to and from remote directory trees, but only send the changes. The result is that rsync is a great bandwidth-saver.

This approach is easy to secure, too. Rsync uses ssh to create the connection, so people can’t normally snoop on the transfer, and redirecting DNS will be immediately noticed. If the website is compromised, just reset it and re-send a copy; as long as you retain a local copy, no data can be permanently lost. I’ve been doing this for years, and been happy with this approach.

On a full-capability hosting service, using rsync this is easy. Just install rsync on the remote system (typically using yum or apt-get), and run:

 rsync -a LOCALDIR REMOTENAME@REMOTESITE:REMOTEDIR

Unfortunately, at least some of the cheap hosting services available today don’t make this quite so easy. The cheapest hosting services are “shared” sites that share resources between many users without using full operating system or hardware virtualization. I’ve been looking at a lot of the cheap Linux web hosting services like these such as WebhostGIANT, Hostmonster, Hostgator, and Bluehost. It appears that at least some of these hosting companies improve their security by greatly limiting the access granted to you via the ssh/shell interface. I know that WebhostGIANT is an example, but I believe there are many such examples. So, even if you have ssh access on a Linux system, you may only get a few commands you can run like “mv” and “cp” (and not “tar” or “rsync”). You could always ask the hosting company to install programs, but they’re often reluctant to add new ones. But… it turns out that you can use rsync and other such services, without asking them to install anything, at least in some cases. I’m looking for new hosting providers, and realized (1) I can still use this approach without asking them to install anything, but (2) it requires some technical “magic” that others might not know. So, here’s how to do this, in case this information/example helps others.

Warning: Complicated technical info ahead.

I needed to install some executables, and rather than recompiling my own, I grabbed pre-compiled executables. To do this, I found out the Linux distribution used by the hosting service (in the case of WebhostGIANT, it’s CentOS 5, so all my examples will be RPM-based). On my local Fedora Linux machine I downloaded the DVD “.iso” image of that distro, and did a “loopback mount” as root so that I could directly view its contents:

 cd /var/www     # Or wherever you want to put the ISO.
 wget ...mirror location.../CentOS-5.2-i386-bin-DVD.iso
 mkdir /mnt/centos-5.2
 mount CentOS-5.2-i386-bin-DVD.iso /mnt/centos-5.2 -o loop
 # Get ready to extract some stuff from the ISO.
 cd
 mkdir mytemp
 cd mytemp

Now let’s say I want the program “nice”. On a CentOS or Fedora machine you can determine the package that “nice” is in using this command:

 rpm -qif `which nice`
Which will show that nice is in the “coreutils” package. You can extract “nice” from its package by doing this:
 rpm2cpio /mnt/centos-5.2/CentOS/coreutils-5.97-14.el5.i386.rpm | \
   cpio --extract --make-directories
Now you can copy it to your remote site. Presuming that you want the program to go into the remote directory “/private/”, you can do this:
 scp -p ./usr/bin/rsync MY_USERID@MY_REMOTE_SITE:/private/

Now you can run /private/nice, and it works as you’d expect. But what about rsync? Well, when you try to do this with rsync and run it, it will complain with an error message. The error message says that rsync can’t find another library (libpopt in this case). The issue is that and cheap web hosting services often don’t provide a lot of libraries, and they won’t let you install new libraries in the “normal” places. Are we out of luck? Not at all! We could just recompile the program statically, so that the library is embedded in the file, but we don’t even have to do that. We just need to upload the needed library to a different place, and tell the remote site where to find the library. It turns out that the program “/lib/ld-linux.so” has an option called “—library-path” that is specially designed for this purpose. ld-linux.so is the loader (the “program for running programs”), which you don’t normally invoke directly, but if you need to add library paths, it’s a reasonable way to do it. (Another way is to use LD_LIBRARY_PATH, but that requires that the string be interpreted by a shell, which doesn’t always happen.) So, here’s what I did (more or less).

First, I extracted the rsync program and necessary library (popt) on the local system, and copied them to the remote system (to “/private”, again):

 rpm2cpio /mnt/centos-5.2/CentOS/rsync-2.6.8-3.1.i386.rpm | \
   cpio --extract --make-directories
 # rsync requires popt:
 rpm2cpio /mnt/centos-5.2/CentOS/popt-1.10.2-48.el5.i386.rpm | \
   cpio --extract --make-directories
 scp -p ./usr/bin/rsync ./usr/lib/libpopt.so.0.0.0 \
        MY_USERID@MY_REMOTE_SITE:/private/
Then, I logged into the remote system using ssh, and added symbolic links as required by the normal Unix/Linux library conventions:
 ssh MY_USERID@MY_REMOTE_SITE
 cd /private
 ln -s libpopt.so.0.0.0 libpopt.so 
 ln -s libpopt.so.0.0.0 libpopt.so.0

Now we’re ready to use rsync! The trick is to tell the local rsync where the remote rsync is, using “—rsync-path”. That option’s contents must invoke ld-linux.so to tell the remote system where the additional library path (for libopt) is. So here’s an example, which copies files from the directory LOCAL_HTTPD to the directory REMOTE_HTTPDIR:

rsync -a \
 --rsync-path="/lib/ld-linux.so.2 --library-path /private /private/rsync" \
 LOCAL_HTTPDIR REMOTENAME@REMOTESITE:REMOTE_HTTPDIR

There are a few ways we can make this nicer for everyday production use. If the remote server is a cheap shared system, we want to be very kind on its CPU and bandwidth use (or we’ll get thrown off it!). The “nice” command (installed by the steps above) will reduce CPU use on the remote web server when running rsync. There are several rsync options that can help, too. The “—bwlimit=KBPS” option will limit the bandwidth used. The “—fuzzy” option will reduce bandwidth use if there’s a similar file already on the remote side. The “—delete” option is probably a good idea; this means that files deleted locally are also deleted remotely. I also suggest “—update” (this will avoid updating remote files if they have a newer timestamp) and “—progress” (so you can see what’s happening). Rsync is able to copy hard links (using “-H”), but that takes more CPU power; I suggest using symbolic links and then not invoking that option. You can enable compression too, but that’s a trade-off; compression will decrease bandwidth but increase CPU use. So our final command looks like this:

rsync -a --bwlimit=100 --fuzzy --delete --update --progress \
 --rsync-path="/private/nice /lib/ld-linux.so.2 --library-path /private /private/rsync" \
 LOCAL_HTTPDIR REMOTENAME@REMOTESITE:REMOTE_HTTPDIR

Voila! Store that script in some easily-run place. Now you can easily update your website locally and push it to the actual webserver, even on a cheap hosting service, with very little bandwidth and CPU use. That’s a win-win for everyone.

path: /misc | Current Weblog | permanent link to this entry

Moving hosting service at end of January 2009

I will be moving to a new hosting service at the end of January 2009. (I haven’t determined which hosting service yet.) In theory, there should be very little downtime, but it’s possible the site will be off for a little while. But if that happens, it will be very temporary - I’ll get the site back up as soon as I can.

path: /website | Current Weblog | permanent link to this entry