David A. Wheeler's Blog



Sun, 15 Aug 2010

Geek Video Franchises

I have a new web page on silly game I title Geek Video Franchises. The goal of this game is to interconnect as many geek video franchises as possible via common actors. In this game, you’re only allowed to use video franchises that geeks tend to like.

For example: The Matrix connects to The Lord of the Rings via Hugo Weaving (Agent Smith/Elrond), which connects to Indiana Jones via John Rhys-Davies (Gimli/Sallah), which connects to Star Wars via Harrison Ford (Indiana Jones/Han Solo). The Lord of the Rings directly connects to Star Wars via Christopher Lee (Saruman/Count Dooku). Of course, Lord of the Rings also connects to X-men via Ian McKellen (Gandalf/Magneto), which connects to Star Trek via Patrick Stewart (Professor Xavier / Captain Jean-Luc Picard). Star Trek connects to Dr. Who via Simon Pegg (JJ Abrams’ Montgomery Scott/The Editor), which connects to Harry Potter via David Tennant (Dr. Who #10/Barty Crouch Jr.), which connects to Monty Python via John Cleese (Nearly Headless Nick/Lancelot, etc.).

So if you’re curious, check out Geek Video Franchises.

path: /misc | Current Weblog | permanent link to this entry

Sat, 03 Jul 2010

Opening files and URLs from the command line

Nearly all operating systems have a simple command to open up a file, directory, or URL from the command line. This is useful when you’re using the command line, e.g., xdg-open . will pop up a window in the current directory on most Unix/Linux systems. This capability is also handy when you’re writing a program, because these are easy to invoke from almost any language. You can then pass it a filename (to open that file using the default application for that file type), a directory name to start navigating in that directory (use “.” for the current directory), or a URL like “http://www.dwheeler.com” to open a browser at that URL.

Unfortunately, the command to do this is different on different platforms.

My new essay How to easily open files and URLs from the command line shows how to do this.

For example, on Unix/Linux systems, you should use xdg-open (not gnome-open or kde-open), because that opens the right application given the user’s current environment. On MacOS, the command is “open”. On Windows you should use start (not explorer, because invoking explorer directly will ignore the user’s default browser setting), while on Cygwin, the command is “cygstart”. More details are in the essay, including some gotchas and warnings.

Anyway, take a look at: How to easily open files and URLs from the command line

path: /misc | Current Weblog | permanent link to this entry

Thu, 20 May 2010

Stop Worrying and Love the Internet

Back in 1999 Douglas Adams wrote “How to Stop Worrying and Learn to Love the Internet”. It’s a wonderful essay that is still a good read today. In particular, I think it’s an important article to read if you’re still struggling with understanding where the Internet is going, or if you’re trying to figure out how to address the trustworthiness of group-developed information like Wikipedia, open source software, or the blogosphere. Adams said:

“Because the Internet is so new we still don’t really understand what it is. We mistake it for a type of publishing or broadcasting, because that’s what we’re used to. So people complain that there’s a lot of rubbish online, or that it’s dominated by Americans, or that you can’t necessarily trust what you read on the web. Imagine trying to apply any of those criticisms to what you hear on the telephone. Of course you can’t ‘trust’ what people tell you on the web anymore than you can ‘trust’ what people tell you on megaphones, postcards or in restaurants. Working out the social politics of who you can trust and why is, quite literally, what a very large part of our brain has evolved to do. For some batty reason we turn off this natural scepticism when we see things in any medium which require a lot of work or resources to work in, or in which we can’t easily answer back — like newspapers, television or granite. Hence ‘carved in stone.’ What should concern us is not that we can’t take what we read on the internet on trust — of course you can’t, it’s just people talking — but that we ever got into the dangerous habit of believing what we read in the newspapers or saw on the TV [emphasis mine] — a mistake that no one who has met an actual journalist would ever make... Interactivity. Many-to-many communications. Pervasive networking. These are cumbersome new terms for elements in our lives so fundamental that, before we lost them, we didn’t even know to have names for them.”

My thanks to Andrew Sullivan for reminding me of this important piece.

path: /oss | Current Weblog | permanent link to this entry

Thu, 22 Apr 2010

Filenames and Pathnames in Shell - Doing it Correctly

Traditionally, Unix/Linux/POSIX filenames and pathnames can be almost any sequence of bytes. Unfortunately, most developers and users of Bourne shells (including bash, dash, ash, and ksh) don’t handle filenames and pathnames correctly. Even good textbooks on shell programming get filename and pathname processing completely wrong. Thus, many shell scripts are buggy, leading to surprising failures. In fact, mis-handling of filenames is a significant source of security vulnerabilities.

So I’ve created a short essay on how to correctly process filenames in Bourne shells as used in Unix, Linux, and various POSIX systems. It presumes that you already know how to write Bourne shell scripts.

The essay is: Filenames and Pathnames in Shell: How to do it correctly. Please, take a look!

Frankly, it would be better if filenames weren’t so permissive. In particular, filenames with control characters, leading dash (“-”), and non-UTF-8 encoding cause a lot of grief. To see more about that, please see my essay Fixing Unix/Linux/POSIX Filenames. If filenames weren’t so permissive, correct programs would be much easier to write.

So, Filenames and Pathnames in Shell: How to do it correctly explains how to handle filenames properly in shell programs, given the current situation. Please take a look; I hope you find it useful.

path: /oss | Current Weblog | permanent link to this entry

Fri, 02 Apr 2010

The new face of journalism: PJ, Groklaw, and SCO

The jury in the District Court of Utah trial between SCO Group and Novell has issued a verdict, and SCO lost big. SCO had been threatening and trying to extract money from many innocent people and organizations, including the developers and users of Linux, IBM, Red Hat, and Novell. But the jury found that the copyrights for Unix did not go from Novell to SCO, so many of SCO's claims against these innocent people have collapsed. It'll take many years for the rest of the cases to wind down, but their other cases were even less probable.

Perhaps the happiest part of this sorry tale is the rise of Groklaw, established and run by PJ. Carla Schroder's "Groklaw: How One Person Can Do Big Deeds. Thanks PJ." and Brian Proffitt's "SCO, Novell: Grokking Where Credit is Due" wisely point out the important role that Groklaw has played in this saga.

It's hard to know if Groklaw changed the outcome of this case, but Groklaw clearly changed what people knew about the case. Traditional journalists completely failed the public in the SCO cases. Even though this had the potential to seriously harm the most important development in information technology (IT) — the rise of open source software — almost no IT journalists looked into it. The few that did tended to spend little time looking at (or for) evidence. If journalists are simply reorganizing press releases, there's really no need for journalism, is there?

Groklaw was vastly different. Groklaw is more than a website or blog, it is a community of people who gathered evidence, analyzed it, and helped other people get the true picture. Traditional journalists may bemoan the loss of local newspapers, but why should people pay for rehashed press releases when the blogs are a more accurate and broader source of information? In short, if you wanted full and accurate public information related to SCO, Groklaw had it; traditional sources didn't.

While Groklaw is a community, PJ was and is a key part of it. She had the idea of setting up Groklaw, and made it work. In short, she established an environment, and made it possible for the rest of the world to see what was going on.

So hats off to Groklaw, and to PJ in particular. Journalism will never be the same again.

path: /oss | Current Weblog | permanent link to this entry

Sun, 21 Mar 2010

Using Wikipedia for research

Some teachers seem to lose their minds when asked about Wikipedia, and make absurd rules like “I forbid students from using Wikipedia”. A 2008 article states that Wikipedia is the encyclopedia “that most universities forbid students to use”.

But the professors don’t need to be such Luddites; it turns out that college students tend to use Wikipedia quite appropriately. A research paper titled How today’s college students use Wikipedia for course-related research examines Wikipedia use among college students; it found that Wikipedia use was widespread, and that the primary reason they used Wikipedia was to obtain background information or a summary about a topic. Most respondents reported using Wikipedia at the beginning of the research process; very few used Wikipedia near or at the end. In focus group sessions, students described Wikipedia as “the very beginning of the very beginning for me” or “a .5 step in my research process”, and that it helps primarily in the beginning because it provided a “simple narrative that gives you a grasp”. Another focus group participant called Wikipedia “my presearch tool”. Presearch, as the participant defined it, was “the stage of research where students initially figure out a topic, find out about it, and delineate it”.

Now, it’s perfectly reasonable to say that Wikipedia should not be cited as an original source; I have no trouble with professors making that rule. Wikipedia itself has a rule that Wikipedia does not publish original research or original thought. Indeed, the same is true for Encyclopedia Britannica or any other encyclopedia; encyclopedias are supposed to be summaries of knowledge gained elsewhere. You would expect that college work would normally not have many citations of any encyclopedia, be it Wikipedia or Encyclopedia Britannica, simply because encyclopedias are not original sources.

Rather than running in fear from new materials and techologies, teachers should be helping students understand how to use them appropriately, helping them consider the strengths and weaknesses of their information sources. Wikipedia should not be the end of any serious research, but it’s a reasonable place to start. You should supplement it with other material, for the simple reason that you should always examine multiple sources no matter where you start, but that doesn’t make Wikipedia less valuable. For younger students, there are reasonable concerns about inappropriate material (e.g., due to Wikipedia vandalism and because Wikipedia covers topics not appropriate for much younger readers), but the derivative “Wikipedia Selection for Schools” is a good solution for that problem. I’m delighted that so much information is available to people everywhere; we need to help people use these resources instead of ignoring them.

And speaking of which, if you like Wikipedia, please help! With a little effort, you can make it better for everyone. In particular, Wikipedia needs more video; please help the Video on Wikipedia folks get more videos on Wikipedia. This also helps the cause of open video, ensuring that the Internet continues to be open to innovation.

path: /misc | Current Weblog | permanent link to this entry

Sat, 06 Mar 2010

Robocopy

If you use Microsoft Windows (XP or some later version), and don't have an allergic reaction to the command line, you should know about Robocopy. Robocopy ("robust file copy") is a command-line program from Microsoft that copies collections of files from one place to another in an efficient way. Robocopy is included in Windows Vista, Windows 7, and Windows Server 2008. Windows XP and Windows Server 2003 users can download Robocopy for free from Microsoft as part of the Windows Server 2003 "Resource Kit Tools".

Robocopy copies files, like the COPY command, but Robocopy will only copy a file if the source and destination have different time stamps or different file sizes. Robocopy is nowhere near as capable as the Unix/Linux "rsync" command, but for some tasks it suffices. Robocopy will not copy files that are currently open (by default it will repeatedly retry copying them), it can only do one-way mirroring (not bi-directional synchronization), it can only copy mounted filesystems, and it's foolish about how it copies across a network (it copies the whole file, not just the changed parts). Anyway, you invoke it at the command line like this:

ROBOCOPY Source Destination OPTIONS

So, here's an example of copying everything from "c:\data" to "q:\data":

 robocopy c:\data u:\data /MIR /NDL /R:20

To do this on an automated schedule in Windows XP, put your commands into a text file with a name ending in ".bat" and select Control Panel-> Scheduled Tasks-> Add Scheduled Task. Select your text file to run, have it run "daily". You would think that you can't run it more than once a day this way, but that's actually not true. Click on "Open advanced properties for this task when I click Finish" and then press Finish. Now select the "Schedule" tab. Set it to start at some time when you're probably using the computer, click on "Advanced", and set "repeat task" so it will run (say, every hour with a duration of 2 hours). Then click on "Show multiple schedules", click "new", and then select "At system startup". Now it will make copies on startup AND every hour. You may want to go to the "Settings" tab and tweak it further. You can use Control Panel-> Scheduled tasks to change the schedule or other settings.

A GUI for Robocopy is available. An alternative to Robocopy is SyncToy; SyncToy has a GUI, but Microsoft won't support it, I've had reliability and speed problems with it, and SyncToy has a nasty bug in its "Echo" mode... so I don't use it. I suspect the Windows Vista and Windows 7 synchronization tools might make Robocopy a less useful, but I find that the Windows XP synchronization tools are terrible... making using Robocopy a better approach. There are a boatload of applications out there that do one-way or two-way mirroring, including ports of rsync, but getting them installed in some security-conscious organizations can be difficult.

Of course, if you're using Unix/Linux, then use rsync and be happy. Rsync usually comes with Unix/Linux, and rsync is leaps-and-bounds better than robocopy. But not everyone has that option.

path: /misc | Current Weblog | permanent link to this entry

Sun, 28 Feb 2010

Open government: Default release as OSS and Open Access

U.S. government agencies are soliciting ideas on how to make them more transparent, participatory, collaborative and innovative.

Please support proposals to release government-funded works by default as open access (for research papers) or as open source software (for software). An example is the proposal to the National Science Foundation (NSF) called Public funding = Public viewing. This proposal recommends that publicly funded projects must be published as open access and all data and code shared as open source software. Please vote for this, make helpful comments, and so on. Similarly, please vote for and/or add similar proposals for other agencies where they apply.

If "we the people" pay for research and development, then "we the people" should normally get the results. I can see the need for exceptions — particularly for classified works — but those should be exceptions. In short, I think this kind of proposal makes sense.

As I've commented before, Government-developed Unclassified Software should by default be released as Open Source Software, and research papers produced from U.S. government funding should be open access. So please make sure that U.S. agencies know this. Thanks.

path: /oss | Current Weblog | permanent link to this entry

Mon, 22 Feb 2010

Free/Libre/Open Source Software's big win: Jacobsen/JMRI v. Katzer

There’s been a major legal victory for Free/Libre/Open Source Software (FLOSS): Jacobsen v. Katzer. Articles like Bruce Perens’ “Inside Open Source’s Historic Victory” and A Big Victory for F/OSS: Jacobsen v. Katzer is Settled give many of the specifics; here is a quick summary.

Bob Jacobsen is a high-energy physicist who developed (as a hobby) the Java Model Railroad Interface (JMRI) Project. JMRI is a set of FLOSS Java tools for configuring and controlling model railroad trains. Matthew Katzer used loopholes in the law to patent ideas that Jacobsen and others had created and publicly discussed first, domain-squatted, tried to embarass Jacobsen to Jacbonsen’s employer, and used part of Jacobsen’s JMRI software in Katzer’s own product without complying with the JMRI license (by not providing the required credit). The JMRI has a short summary of this unpleasant fight, as well as lots of details and court papers.

What’s impressive was that Bob Jacobsen stuck through a very hard series of events. At first the court didn’t seem to understand FLOSS at all, and Jacobsen was handed some very unpleasant defeats. At one point, Jacobsen had to pay over $30,000 of his own money.

But Jacobsen persevered, and won critical rulings and a final settlement that is really a complete victory for him. In 2008 the United States Court of Appeals for the Federal Circuit vacated the district court’s ruling and held that the terms of the Artistic License (a FLOSS license) are enforceable. The court said, “Open source licensing has become a widely used method of creative collaboration that serves to advance the arts and sciences in a manner and at a pace that few could have imagined just a few decades ago”. On February 18, 2010, the parties finally settled. Among other terms, Jacobsen has won $100,000, Katzer is forbidden to use Jacobsen’s software, and the two patents at issue have been disclaimed. What’s more, the rulings stemming from this case have created a precedent that FLOSS licenses are legally enforceable, eliminating a lot of uncertainty, and because there is a final settlement it is not possible to appeal the case. Strictly speaking, the precedents do not automatically apply everywhere in the U.S., but even where they do not strictly apply, they will still have a strong weight.

This result is critically important to FLOSS. If FLOSS developers could not enforce their licenses, the probable result would be that a lot of such software would never be written. The Amici Curiae brief by Creative Commons Corporation et al. and the Software Freedom Law Center Amicus Brief in Jacobsen v. Katzer both do a nice job explaining why getting this ruling right was so important.

So, my hat’s off to Bob Jacobsen. Through his persistence, he’s made the world better for all of us.

path: /oss | Current Weblog | permanent link to this entry

Sat, 09 Jan 2010

California: Open Source Software is Okay!

The California state government has officially declared that it’s okay to use open source software inside the California state government. On January 7, 2010, the California Office of the State Chief Information Officer (OCIO) released Information Technology Policy Letter (ITPL) 10-01, titled “Open Source Software Policy” . A key purpose of ITPL 10-01 is to “formally establish the use of Open Source Software (OSS) in California state government as an acceptable practice”, and the first sentence of its policy statement is that “The OCIO permits the use of OSS”. It even includes the ten-point open source definition (OSD) as promulgated by the Open Source Intiative, to make sure that there’s no misunderstanding.

I think this is a big deal. Officially saying “it’s okay to use free/libre/open source software (FLOSS)” is really important before FLOSS can get widespread use in governments. Most technologists already understand the potential advantages of FLOSS, but they encounter a lot of resistance when they try to use or develop FLOSS in large organizations like governments. Far too many middle managers are instinctively afraid of change from “the way we’ve always done it”. For example, they may be afraid of unseen problems, or afraid their bosses will rake them over the coals later. Far too often the middle managers have misunderstandings about FLOSS, too. For example, many managers still believe the myth that “you can’t get support” and are unaware of the many companies that do provide such support. Companies that make competing proprietary products are delighted (of course) when governments don’t consider their competition... but in an era of tight budgets, it doesn’t make sense for governments to ignore competing (and often less expensive) products. When top officials give official “top cover” permission to consider FLOSS, then the technologists and middle managers are far more likely to fairly and honestly consider them.

Also, the fact that it’s California matters. The economy of the California is larger than most countries (if it were a country, it would be third through tenth in the world depending on how you measure it). Anything the state of California does can influence other states and countries; acts like this further legitimize the user of Free/Libre/Open Source Software (FLOSS).

Of course, the state of California isn’t the only government organization to release a memo officially declaring that it’s okay to use free/libre/open source software (FLOSS). Just looking inside the U.S., the U.S. DoD did this in 2003, the Office of Management and Budget (OMB) released a somewhat similar memo in 2004 that applied to the entire U.S. federal government, the U.S. Navy did this in 2007, and the the U.S. DoD released clarifying guidance in 2009 re-emphasizing this point. And that’s only a few examples from U.S. government organizations; the examples from around the world are legion. It’s really difficult to get people to change what they do... as you can tell from the number of times that various U.S. federal government organizations have had to state and re-state it. Still, they really do have an effect. Official policy statements that FLOSS is used, such as the one California just released, are a necessary first step to changing things from “the way we’ve always done things”.

path: /oss | Current Weblog | permanent link to this entry

Thu, 31 Dec 2009

Moglen on Patents and Bilski

Eben Moglen has a very interesting presentation on patents (including comments on Bilski) that was originally presented on Nov. 2, 2009. Software patents and business method patents have been a disaster for the U.S. and world economy, and he has some interesting things to say about how we got here (and how it could be fixed).

One interesting point he made, which I hadn’t heard before, is that there is a fundamental conflict between the patent system and the Administrative Procedure Act of 1946 (aka the APA). Nearly all of the U.S. government must obey the APA before creating new rules and regulations. According to the APA, U.S. agencies must keep the public informed, provide for public participation in the rulemaking process, establish uniform standards for rulemaking and adjudication, and provide for judicial review. In particular, agencies normally have to perform a cost-benefit analysis.

But the patent system pre-existed the APA. Patents, since they are government-created monopolies, can constrain people in the same ways that any other rule or regulation can. However, the government does not follow the APA to determine if each proposed patent should be granted. Instead, the old patent process was essentially grandfathered in instead, as a special exception to the APA. Because the APA is not considered when examining each patent, no one in government asks the normally-required question “How will each proposed patent be publicly reviewed before it is granted?”. Patents on ideas that are patently obvious are routinely granted, in part because there is no public review before they are granted and because the patent office (by policy) ignores most information available to the public. All because the patent-granting process is not required to enable public participation in the rulemaking process, in this case, the process for permitting the granting of a patent. Also, when examining a patent to determine if it should be granted, no one asks normally-obvious questions like:

Because the patent system predates the APA, all potential harms to society from a patent are completely ignored during the patent examination process. If patents were individually considered as new regulations under the APA, such questions would need to be carefully considered. That’s an interesting point Moglen makes.

It’s my hope that the Supreme Court will clearly stop software patents. We shall see.

path: /oss | Current Weblog | permanent link to this entry

Sun, 13 Dec 2009

U.S. research should be open access

The Office of Science and Technology Policy (OSTP) has launched a “public consultation on Public Access Policy”, to see if research funded by U.S. grants should be made available as open access results. I think this is important — I believe publicly-funded unclassified research should actually be made available to the public.

Historically, the U.S. pays a fortune for research, the results are written up as papers for journals, and then various publishers acquire total rights to these papers and charge exorbitant monopoly fees for them. The result: Most U.S. citizens cannot afford to see the research their taxes pay for.

The basic question here is really straightforward: Should publicly-funded research results be made available directly to the public instead? Or, should private companies continue to gain ownership over publicly-funded results, for either nothing or a tiny fraction of the public’s costs?

A small number of journal publishers and societies strongly want to keep things the way they are, of course. It makes sense from their point of view; everybody likes free (or nearly free) money! Historically, this arrangement was created because it can be expensive to publish and manage paper. However, that rationale has become completely obsolete. Few people want the paper any more — they want the research, on-line, without a paywall. And don’t give me nonsense about the “costs” of peer review. Many journals don’t pay their reviewers (the reviewers do it gratis), and even if they did, the total control they gain is still unjustified; the U.S. government spends far more per paper than they do for review.

The current sequestering of research is not good for science or the country. I’m currently reading the interesting book “Are We Rome?” by Cullen Murphy, and I can’t help but see some parallels. Chapter 3 is all about “when public good meets private opportunity”. Private organizations may pay for private research, and then keep their results private. But when the public pays for research, it should be shocking if it does not get released back to the public. And by “released back”, I mean released back at no fee at all.

So who will pay for the printing, complex peer review, storage, and fancy indexing of these research results? I think the very question shows a failure to understand current technology, but let’s answer it anyway. Most peer review isn’t paid-for anyway, and if it is, it’s a tiny cost compared to the research itself. Storage? Don’t make me laugh; for $100 I can buy storage for the all of the U.S. research papers for a year. Indexing? The government shouldn’t be doing serious indexing at all!! Just put it on a government site with a basic form filled out (title, authors, date, keywords, abstract, and a link to the actual paper on the government site). If it’s not behind a paywall, the many commercial search systems will index it for you.

I do think there should be a centralized government repository of such papers. If it’s distributed, then papers could be lost without anyone knowing it. I think they should be freely redistributable, so others can copy what they want, but a centralized repository would make sure that we keep all of them available forever. Also, bandwidth costs can be reduced by scale. There’s a risk that they all get lost at once, but it’s easier to copy everything if there’s one place to start from. If it’s a complicated site, then they’ve done it wrong.... for each paper there should be a simple “summary” page with title, authors, etc., and the actual paper itself.

OSTP cites the experience of NIH; NIH did wonderful work for releasing as open access, and in my mind the real problems are that they didn’t go far enough. First, NIH has a one-year embargo... if I already paid for it (and I did), why should wealthy people and organizations get the results first? Second, NIH only considers the actual papers, not the data and software programs that support the works... yet often those are more important. If they were funded by the public, then the public should get them (unless they’re classified, of course, but then they shouldn’t be released at all). I’m sure there are complications and exceptions, but a “default open access” policy would go a long way.

So please, tell the OSTP that the U.S. should release government-funded research as open access publications, available to anyone on the Internet without a paywall. In short, if “we the people” paid for it, then “we the people” should get it. For more information, see this Request for Information (RFI)

path: /oss | Current Weblog | permanent link to this entry

Sun, 29 Nov 2009

Success on Fully Countering Trusting Trust through Diverse Double-Compiling

My November 23 public defense of Fully Countering Trusting Trust through Diverse Double-Compiling went well. This was my 2009 PhD dissertation that expands on how to counter the “trusting trust” attack by using the “Diverse Double-Compiling” (DDC) technique.

Most importantly (to me), my PhD committee agreed that I successfully defended my dissertation. Whew! As a result, I'm essentially done with my PhD.

I learned a lot about creating formal proofs using computers by doing this dissertation. I wanted to give the strongest possible evidence that DDC counters the trusting trust attack, and formal proofs are the strongest form of proof that I know of... which is why I created them. Frankly, creating proofs was kind of fun once I knew what I was doing, but getting there was more painful than it needed to be. Many books are on the underlying mathematics (e.g., giving you extreme detail about various logic systems)... which is great if you're a mathematician, but not so helpful if you are simply trying to use the mathematics. Some books explain how to do things by hand, but that is an unnecessary amount of pain; one of my proofs is 30 steps long, and I sure wouldn't have wanted to create that by hand. Some books seemed to assume that you already knew everything the book covered, which is an odd assumption to me :-).

Here's a trivial example: Most logic systems can prove anything if you give them inconsistent assumptions. That's bad! You can get rid of that problem by sending the assumptions to a model-builder like mace4... if it can create a model, then the assumptions are consistent. So, make sure you send your assumptions through a model-builder to see if your assumptions are consistent.

I've posted detailed data from my dissertation so that people can reproduce my results. I think it's really important that results be reduceable, otherwise, it's not science. As part of that data, I've included a few files that may help potential proof tool users get started. In particular, I've posted prover9 input to prove that Socrates is mortal, a prover9 input to prove that the square root of 2 is irrational, and prover9 input showing how to easily declare that terms in a list are distinct.

The "trusting trust" attack has historically been considered the "uncounterable" attack. Now the attack can be effectively detected — and thus countered.

path: /security | Current Weblog | permanent link to this entry

Fri, 20 Nov 2009

Fully Countering Trusting Trust through Diverse Double-Compiling

A last-minute reminder — my public defense of Fully Countering Trusting Trust through Diverse Double-Compiling is coming up on November 23, 1-3pm. This is my 2009 PhD dissertation that expands on how to counter the “trusting trust” attack by using the “Diverse Double-Compiling” (DDC) technique.

It will be at George Mason University, Fairfax, Virginia, Innovation Hall, room 105. [campus location] [Google map] Anyone is welcome!

I've made a few small tweaks over the last few weeks. I modified proof #2 to reduce its requirements even further (making it even easier to do); I had mentioned in text that this was possible, but now the formal proof shows it. I also used mace4 to show that the assumptions of each proof are consistent. Formal proofs aren't easy to create, or trivial to read, but the reason I went to that trouble is to show that it's not just my opinion that I've countered the trusting trust attack... I want to show, conclusively, that the trusting trust attack has been countered. I know of no stronger method to show that than a formal proof.

The "trusting trust" attack has historically been considered the "uncounterable" attack. Nuts to that. Now the attack can be effectively detected — and thus countered.

path: /security | Current Weblog | permanent link to this entry

Fri, 13 Nov 2009

Trusting Trust, DDC, and Free-Libre/Open Source Software (FLOSS)

As I noted in my blog, I’ve just released my dissertation “Fully Countering Trusting Trust through Diverse Double-Compiling (DDC). But what does that mean for Free-Libre/Open Source Software (FLOSS)? In short, it’s fantastic news for FLOSS, but to explain why that’s so, I need to backtrack first.

The “trusting trust” attack is a nasty computer attack that involves creating a subverted compiler in such a way that it even subverts compilers. It was originally reported in a 1974 security evaluation of Multics, but most people heard about it from Ken Thompson’s 1984 Turing Award presentation (Ken Thompson is a creator of Unix). This attack is incredibly nasty, and what’s worse, until now there’s been no effective countermeasure to it. Indeed, some have claimed that it could not ever be countered, making the whole idea of “computer security” a non-starter.

The “trusting trust” attack appears to be especially devastating to FLOSS. The problem is that with the trusting trust attack, the source code that people review does not correspond to the executable that’s actually running, and that seems to completely torpedo the “many eyes” review that FLOSS makes possible. The whole world could carefully review a program’s source code, but it wouldn’t matter if the compiler turns it undetectably into something malicious.

Thankfully, there is an effective countermeasure, which I’ve named Diverse Double-Compiling (DDC). You can see my dissertation which explains what it is, proves that it works, and even demonstrates it with several compilers including GCC. (I will be giving a public defense of it on November 23, 2009, if you’d like to come.) This means that source code review, such as mass review of FLOSS code, can now actually work.

But there’s more, because there’s an interesting catch with DDC. DDC counters the trusting trust attack, but it’s only useful for people who have access to the compiler source code. Fundamentally, DDC is a technique for determining if a compiler executable corresponds with its source code, but only people who have the source code can apply DDC to see if that’s true. What’s more, only people who have access to the source code will find the statement “the source and executable correspond” particularly useful. (You could use trusted intermediaries, but this requires total trust in those intermediaries, making such claims far weaker than claims that anyone can check.) What’s more, DDC is actually useful beyond what we normally think of as compilers, because you can redefine “compiler” as including other parts (such as the operating system). In that case, you can even show that the system’s executables all correspond to their source code. But you can only use DDC to counter the trusting trust attack if you have access to the source code.

So we now have a radical change. Now that DDC has been shown to work, we can see that software with available source code (including FLOSS) has a fundamental security advantage over other software. That doesn’t mean that all FLOSS is more secure than all proprietary software, of course. But FLOSS already had a general security advantage because it better meets Saltzer & Schroeder’s “Open design principle” (as explained in their 1974-1975 papers). Now we have an attack — the trusting trust attack — for which FLOSS has a fundamental security advantage. The time of ignoring FLOSS options, because of misplaced notions that FLOSS cannot be as secure as proprietary software, needs to come to an end.

path: /oss | Current Weblog | permanent link to this entry

Mon, 02 Nov 2009

New PhD Dissertation: Fully Countering Trusting Trust through Diverse Double-Compiling

An Air Force evaluation of Multics, and Ken Thompson’s Turing award lecture (“Reflections on Trusting Trust”), showed that compilers can be subverted to insert malicious Trojan horses into critical software, including themselves. If this “trusting trust” attack goes undetected, even complete analysis of a system’s source code will not find the malicious code that is running. Previously-known countermeasures have been grossly inadequate. If this attack cannot be countered, attackers can quietly subvert entire classes of computer systems, gaining complete control over financial, infrastructure, military, and/or business system infrastructures worldwide.

Thankfully, there is a countermeasure to the “trusting trust” attack. In 2005 I wrote a paper on Diverse Double-Compiling (DDC), published by ACSAC, where I explained DDC and why it is an effective countermeasure. But some people still raised concerns. Would DDC really counter the attack? Would DDC scale up to real-world compilers? Also, the ACSAC paper required “self-parenting” compilers — can DDC handle compilers that are not self-parenting?

I’m now releasing Fully Countering Trusting Trust through Diverse Double-Compiling, my 2009 PhD dissertation that expands on how to counter the “trusting trust” attack by using the “Diverse Double-Compiling” (DDC) technique. This dissertation was accepted by my PhD committee on October 26, 2009.

On November 23, 2009, 1-3pm, I will be giving a public defense of this dissertation. If you’re interested, please come! It will be at George Mason University, Fairfax, Virginia, Innovation Hall, room 105. [campus location] [Google map]

This dissertation’s thesis is that the trusting trust attack can be detected and effectively countered using the “Diverse Double-Compiling” (DDC) technique, as demonstrated by (1) a formal proof that DDC can determine if source code and generated executable code correspond, (2) a demonstration of DDC with four compilers (a small C compiler, a small Lisp compiler, a small maliciously corrupted Lisp compiler, and a large industrial-strength C compiler, GCC), and (3) a description of approaches for applying DDC in various real-world scenarios. In the DDC technique, source code is compiled twice: once with a second (trusted) compiler (using the source code of the compiler’s parent), and then the compiler source code is compiled using the result of the first compilation. If the result is bit-for-bit identical with the untrusted executable, then the source code accurately represents the executable.

Many people commented on my previous 2005 ACSAC paper on the topic. Bruce Schneier wrote an article on ‘Countering “Trusting Trust”’, which I think is one of the best independent articles describing my work on DDC.

This 2009 dissertation significantly extends my previous 2005 ACSAC paper. For example, I now have a formal proof that DDC is effective (the ACSAC paper only had an informal justification). I also have additional demonstrations, including one with GCC (to show that it scales up) and one with a maliciously corrupted compiler (to show that it really does detect them in the real world). The dissertation is also more general; the ACSAC paper only considered the special case of a “self-parenting” compiler, while the dissertation eliminates that assumption.

So if you’re interested in countering the “trusting trust” attack, please take a look at my work on countering trusting trust through diverse double-compiling (DDC).

path: /security | Current Weblog | permanent link to this entry

Wed, 28 Oct 2009

Notes about the DoD and OSS memo

Yesterday I posted about the new 2009 DoD memo about open source software. I'm delighted to see that the word is getting out. Slashdot, Linux Weekly News, and LXer.com all mentioned the new memo and even pointed to my post. Others are noting the new memo too, including CNet's Matt Asay, InformationWeek's J. Nicholas Hoover, InformationWeek's Serdar Yegulalp, NetworkWorld, and The H. Dan Risacher has posted on Slashdot some background and history for this new 2009 DoD memo. He notes, for example, that "The lawyers were by far the biggest delay" in getting this memo released.

There's some supporting information for this memo at the DoD Free Open Source Software (FOSS) Communities of Interest (COI) site, which posts the memo itself and a supporting DoD Open Source Software Frequently Asked Questions (FAQ) document.

To help potential users, I've updated my presentation Open Source Software (OSS) and the U.S. Department of Defense (DoD), which I hope will clarify some things. I should also remind people about the 2003 MITRE study "Use of Free and Open Source Software (FOSS) in the U.S. Department of Defense", which showed that in 2003 Free/libre/open source software (FLOSS, FOSS, or OSS) was already widely used in the DoD.

path: /oss | Current Weblog | permanent link to this entry

Tue, 27 Oct 2009

New DoD memo on Open Source Software

The U.S. Department of Defense (DoD) has just released Clarifying Guidance Regarding Open Source Software (OSS), a new official memo about open source software (OSS). This 2009 memo should soon be posted on the list of ASD(NII)/DoD CIO memorandums. This 2009 memo is important for anyone who works with the DoD (including contractors) on software and systems that include software... and I suspect it will influence many other organizations as well. Let me explain why this new memo exists, and what it says.

Back in 2003 the DoD released a formal memo titled Open Source Software (OSS) in the Department of Defense. This older memo was supposed to make it clear that it was fine to use and develop OSS in the DoD. Unfortunately, as the new 2009 memo states, "there have been misconceptions and misinterpretations of the existing laws, policies and regulations that deal with software and apply to OSS that have hampered effective DoD use and development of OSS".

This new 2009 memo simply explains "the implications and meaning of existing laws, policies and regulations", hopefully eliminating many of those misconceptions and misinterpretations. A lot of the "meat" is in the Attachment 2, section 2 (guidance), so let's walk through that:

But perhaps most important is this memo's opening statement: "To effectively achieve its missions, the Department of Defense must develop and update its software-based capabilities faster than ever, to anticipate new threats and respond to continuously changing requirements. The use of Open Source Software (OSS) can provide advantages in this regard...". As with the later part (b), here we have an official government document acknowledging that OSS can have a significant advantage. What's more, these potential advantages aren't necessarily just minor cost savings; OSS can in some cases provide a military advantage. Which is a more-than-adequate justification for considering OSS, as I have been advocating for years.

I'm really delighted that this memo has finally been released. I participated in the original brainstorming meeting to create this memo (as did John Scott), and I reviewed many versions of it, but many, many other hands have stirred this pot since it began. It took over 18 months to create it and get it out; getting this coordinated was a very long and drawn-out process. My thanks to everyone who worked to help make this happen. In particular, congrats go to Dan Risacher, who led this project to its successful completion.

By the way, if you're interested in the issue of open source software in the U.S. military/national defense, you probably should look at Mil-OSS (at least, join their mailing list, and consider going to their upcoming conference; I was a speaker at their last one). If you're interested in the connection between open source software and the U.S. government (including the military), you might also be interested in the upcoming GOSCON conference on November 5, 2009 (I'm one of the speakers there too).

path: /oss | Current Weblog | permanent link to this entry

Sat, 17 Oct 2009

CVC3 License Changed to BSD

CVC3 is one of the better automated theorem provers. Given certain mathematical assertions, it can in many cases prove that certain claims follow from them. Some tools that can prove properties about programs use CVC3 (and/or similar programs). For example, the Frama-C Jessie plug-in for C and Krakatoa for Java use Why, which can build on one of several programs including CVC3.

Problem is, CVC's license has historically been a problem. I understand that its authors intended for CVC3 to be Free/Libre/Open Source Software (FLOSS), but unfortunately, it was released with additional license clauses that resulted in yet another non-standard license. This was an unfortunate mistake; as I note in my essay on GPL-compatible licenses, it is absolutely critical to choose a standard FLOSS license when releasing FLOSS. In this case, the big problem was the addition of an "indemnification" clause that was really scary; to some, at least, it seemed to imply that if the CVC3 authors were sued, anyone who used or copied the program was obligated to pay their legal bills. Interpreted that way, no one wanted to touch the program... how could any user possibly know their risks? Fedora eventually ruled that this license was non-free (aka not FLOSS), and thus could not be included in Fedora. There was a less-serious problem that if you made a change to the program, you had to change the name... since the program couldn't even compile without a change (at the time), this meant that you had to change the name almost instantly. There is a reason that people have converged on standard FLOSS licenses; if your lawyer says you need to add non-standard clauses, be wary, because the result may be that few people can use your program.

I'm delighted to report that this has a happy ending. CVC3's license has just been changed to a straight BSD license - a well-known license that is universally acknowledged as being FLOSS. This means that there are no licensing problems for Linux distributions. Only about a day after he found this out, Jerry James has submitted a CVC3 package to Fedora. So, I expect that in a relatively short time we'll see CVC3 available directly in common Linux distribution repositories.

I think this is a helpful step towards open proofs, which are cases where an implementation, its proofs, and the necessary tools are all FLOSS. Having a good tool like CVC3 to build on makes it easier to develop useful tools. My hope is to mature formal methods tools so that they can be more scaleable, applicable, and effective than they are today. It's clear that a single little tool cannot possibly do the job; we need suites of tools that can work together. And this is a promising step in that direction.

path: /oss | Current Weblog | permanent link to this entry

Wed, 12 Aug 2009

Auto-DESTDIR released!

I’ve just released Auto-DESTDIR, a software package which helps automate program installation on POSIX/Unix/Linux systems from source code. If you have the problem it solves — automatic support for DESTDIR — you want this!

A little background: Many programs for Unix/Linux are provided as source code. Such programs must be configured, built, and installed, and that last step is normally performed by typing “make install”. The “make install” step normally writes directly to privileged directories like “/usr/bin” to perform the installation. Unfortunately, most modern packaging systems (such as those for .rpm and .deb files) require that files be written to some intermediate directory instead, even though when run they will be in a different filesystem location (because of security issues). This redirection is easy to do if the installation script supports the “DESTDIR convention”; simply set DESTDIR to the intermediate directory’s value and run “make install”. Supporting DESTDIR is a best practice when releasing software. Unfortunately, many source packages don’t support the DESTDIR convention. Auto-DESTDIR causes “make install” to support DESTDIR, even if the provided “makefile” doesn’t support the DESTDIR convention. Auto-DESTDIR is released under the “MIT” license, so it is Free-libre/open source software (FLOSS).

Auto-DESTDIR is implemented using a set of bash shell scripts that wrap typical install commands (such as install, cp, ln, and mkdir), These wrappers are placed in a special directory. The run-redir command modifies the PATH so that the directory with these scripts is listed first, and then runs the given command. The make-redir command invokes “make” using run-redir, along with some extra settings to simplify things. For more information on this approach, and why this is a good way to automate DESTDIR, see the paper Automating DESTDIR, especially its section on wrappers.

So please take a look at the Auto-DESTDIR software package, if you have the problem it solves.

path: /oss | Current Weblog | permanent link to this entry