David A. Wheeler's Blog

Tue, 04 Nov 2008

Kudos and Kritiques: ‘Automated Code Review Tools for Security’ - and OSS

The article “Automated Code Review Tools for Security” (by Gary McGraw) has just been released to the web. Officially, it will be published in IEEE Computer’s December 2008 edition (though increasingly this kind of reference feels like an anachronism!). The article is basically a brief introduction to automated code review. Here are a few kudos and kritiques (sic) of it, including a long discussion about the meaning of “open source software” (OSS) that I think is important to add.

First, a few kudos. The most important thing about this article is that it exists at all. I believe that software developers need to increasingly use static analysis tools to find security vulnerabilities, and articles like this are needed to get the word out. Yes, the current automated review tools for security have all sorts of problems. These tools typically have many false positives (reports that aren’t really vulnerabilities), they have many false negatives (they fail to report vulnerabilities they should), they are sometimes difficult to use, and their results are sometimes difficult to understand. But static analysis tools are still necessary. Software is getting larger, not smaller, because people keep increasing their expectations for software functionality. Security is becoming more important to software, not less, as more critical functions depend on the software and most attackers focus on them. Manual review is often too costly to apply (especially on pre-existing software and in proprietary development), and even when done it can miss ‘obvious’ problems. So in spite of current static analysis tools’ problems, the rising size and development speed of software will force many developers to use static analysis tools. There isn’t much of a practical alternative - so let’s face that! Some organizations (such as Safecode.org) don’t adequately emphasize this need for static analysis tools, so I’m glad to see the need for static analysis tools is being emphasized here and elsewhere.

I’m especially glad for some of the points he makes. He notes that “security is not yet a standard part of the security curriculum”, and that “most programming languages were not designed with security in mind.. [leading to] common and often exploited vulnerabilities.” Absolutely true, and I’m glad he notes it.

There are a number of issues and details the article doesn’t cover. For example, he mentions that “major vendors in this space include Coverity, Fortify, and Ounce Labs”; I would have also included Klocwork (for Insight), and I would have also noted at least the open source software program splint. (Proprietary tools can analyze and improve OSS programs, by the way; Coverity’s ‘open source quality’, and Fortify’s ‘Java open review’ projects specifically work to review many OSS programs.) But since this is a simple introductory article, he has to omit much, so such omissions are understandable. I believe the article’s main point was to explain briefly what static analysis tools were, to encourage people to look into them; from that vantage point, it does the job. If you already understand these tools, you already know what’s in the article; this is an article for people who are not familiar with them.

Now, a few kritiques. (The standard spelling is “critiques”, but I can’t resist using a funny spelling to match “kudos”.) First, the article conflates “static analysis” with “source code analysis”, a problem also noted on Lambda the ultimate. The text says “static analysis tools - also called source code analyzers” - but this is not true. There are two kinds of static analysis tools: (1) Source analysis tools, and (2) binary/bytecode analysis tools. There are already several static analysis tools that work on binary or bytecode, and I expect to see more in the future. Later in the text he notes that binary analysis is possible “theoretically”, but it’s not theoretical - people do that, right now. Yes, it’s true that source code analysis is more mature/common, but don’t ignore binary analysis. Binary analysis can be useful; sometimes you don’t have the source code, and even when you do, it can be very useful to directly analyze the binary (because it’s the binary, not the source, that is actually run). It’s unfortunate that this key distinction was muddied. So, be aware that “static analysis tools” covers the analysis of both source and binary/byte code - and there are advantages of analyzing each.

Second, McGraw says that this is a “relatively young” discipline. That’s sort-of correct in a broad sense, but it’s more complicated than that, and it’s too bad that was glossed over. The basic principles of secure software development are actually quite old; one key paper was Saltzer and Schroeder’s 1975 paper which identified key design principles for security. Unfortunately, while security experts often knew how to develop secure software, they tended to not write that information in a form that ordinary application software developers could use. To the best of my knowledge, my book Secure Programming for Linux and Unix HOWTO (1999-2003) was the first book for software developers (not attackers or security specialists) on how to develop application software that can resist attack. And unfortunately, this information is still not taught in high school (when many software developers learn how to write software), nor is it taught in most undergraduate schools.

Third, I would like to correct an error in the article: ITS4 has never been released as open source, even though the article claims it was. The article claims that “When we released ITS4 as an open source tool, our hope was that the world would participate in helping to gather and improve the rule set. Although more than 15,000 people downloaded ITS4 in its first year, we never received even one rule to add to its knowledge base”. Is it really true, that open source software doesn’t get significant help? Well, we need to know what “open source” means.

What is “open source software”? It’s easy to show that the Open Source Initiative (OSI)’s Open Source Definition, or the Free Software Foundation’s simpler Free Software Definition, are the usual meanings for the term “open source software”. Just search for “open source software” on Google - after all, a Google search will rank how many people point to the site with that term, and how trusted they are. The top ten sites for this term include SourceForge, the Open Source Initiative (including their definition), the Open Source Software Institute, and (surprise!) me (due to my paper Why Open Source Software / Free Software (OSS/FS, FLOSS, or FOSS)? Look at the Numbers!). The top sites tend to agree on the definition of “open source software”, e.g., SourceForge specifically requires that licenses meet the OSI’s open source definition. I tend to use the Free Software Definition, because it’s simpler, but I would also argue that any open source software license would need to meet both definitions to be generally acceptable. Other major sites that are universally acknowledged as supporting open source software all agree on this basic definition. For example, the Debian Free Software Guidelines were the basis of the Open Source Definition, and Fedora’s licensing and licensing guidelines reference both OSI and FSF (and make clear that open source software must be licensed in a way that lets anyone use them). Google Code accepts only a limited number of licenses, all of which meet these criteria. The U.S. Department of Defense’s 2003 policy latter, “Open Source Software in the DoD” essentially uses the Free Software Definition to define the term; it says open source software “provides everyone the rights to use, modify, and redistribute the source code of the software”. In short, the phrase “open source software” has a widely-accepted and specific meaning.

So let’s look at ITS4’s license. As of November 3, 2008, the ITS4 download site’s license hasn’t changed from its original Feb 17, 2000 license. It turns out that the ITS4 license clearly forbids many commercial uses. In fact, their license is clearly labelled a “NON-COMMERCIAL LICENSE”, and it states that Cigital has “exclusive licensing rights for the technology for commercial purposes.” That’s a tip-off that there is a problem; as I’ve explained elsewhere, open source software is commercial software. License section 1(b) says “You may not use the program for commercial purposes under some circumstances. Primarily, the program must not be sold commercially as a separate product, as part of a bigger product or project, or used in third party work-for-hire situations… Companies are permitted to use this program as long as it is not used for revenue-generating purposes….”. In section 2, “(a) Distribution of the Program or any work based on the Program by a commercial organization to any third party is prohibited if any payment is made in connection with such distribution, whether directly… or indirectly…”. Cigital has a legal right to release software in just about any way it wants to; that’s not what I’m trying to point out. What I’m trying to make clear is that there are significant limitations on use and distribution of ITS4, due to its license.

This means that ITS4 is clearly not open source software. The Open Source Definition requires that an open source software license have (#5) No Discrimination Against Persons or Groups, and (#6) No Discrimination Against Fields of Endeavor. The detailed text for #6 even says: “The license must not restrict anyone from making use of the program in a specific field of endeavor. For example, it may not restrict the program from being used in a business, or from being used for genetic research.” Similarly, the Free Software Definition requires the “freedom to run the program, for any purpose” (freedom 0). ITS4 might be described as a “source available”, “shared source”, or “open box” program - but it’s not open source. Real open source software licenses permit arbitrary commercial use; they are designed to include commercial users, not exclude them.

It’s absurd to complain that “no one helped this open source project” when the project was never an open source project. Indeed, even if you release software with an OSS license, there is no guarantee that you’ll get support (through collaboration). No one is obligated to collaborate with you if you release OSS - you have to convince others that they should collaborate. And conversely, you have every legal right to not release under an OSS license. But OSS licenses are typically designed to encourage collaboration; having a license that is not an OSS license unsurprisingly discourages such collaboration. As I point out in my essay “Make Your Open Source Software GPL-Compatible. Or Else”, if you want OSS collaboration you really need to pick from one of the few standard GPL-compatible licenses, for reasons I explain further in that essay.

This matters. My tool flawfinder does a similar task to ITS4, but it is open source software (released under the world’s most popular OSS license, the GPL). And I did get a lot of collaborative help in developing flawfinder, as you can tell from the Flawfinder ChangeLog. Thus, it is possible to have OSS projects that analyze programs and receive many contributions. Here’s a partial list of flawfinder contributors: Jon Nelson, Marius Tomaschewski, Dave Aitel, Adam Lazur, Agustin.Lopez, Andrew Dalgleish, Joerg Beyer, Jose Pedro Oliveira, Jukka A. Ukkonen, Scott Renfro, Sascha Nitsch, Sebastien Tandel, Steve Kemp (lead of the Debian Security Auditing Project), Jared Robinson, Stefan Kost, and Mike Ruscher. That’s at least 17 people co-developing the software! Not all of these contributors added to the vulnerability database, but I know that at least Dave Aitel and Jaren Robinson added new rules, Stefan Kost suggested specific new rules (though I don’t think he wrote the code for it), and that Agustin Lopez and Christian Biere caused changes in the vulnerability database’s reporting information. There may have been more; in a collaborative process, it’s sometimes difficult to fully give credit to everyone who deserves it, and I don’t have time to go through all of the records to determine the minutia. It’d be a mistake to think that only database improvements matter, anyway; other user-contributed improvements were useful! These include changes that enabled analysis of patch files (so you can limit reporting to the lines that have changed), made the reports clearer, packaged the software (for easy installation), and fixed various bugs.

Obviously, releasing under a true OSS license helped immensely in getting contributions - especially if ITS4 is our point of comparison. For example, since flawfinder is an OSS program, it was easily incorporated into a variety of OSS Linux distributions, making it easier to use. I also suspect that some of the people most interested in using this kind of program were people paid to evaluate programs - but many of these uses were forbidden by the default ITS4 license.

In short, a non-OSS program didn’t have much collaborative help, while a similar OSS program got lots of collaborative help… and that is an important lesson. There is a difference between “really being open source software” and “sort of openish but not really”. If you look at ITS4 version 1.1.1, its CHANGES file does list a few external contributions, but they are primarily trivial tweaks or portability changes (to get it to work at all). McGraw himself admits that ITS4 didn’t get any new new rules in its first year. In contrast, OSS flawfinder added user-created rules within 6 months of its initial release, and over time the OSS program had lots of functionality provided by other co-developers. I don’t want McGraw’s incorrect comment about “open source” go unchallenged, when there’s an important lesson to be learned instead.

That said, my thanks to McGraw for noting flawfinder (and RATS), as well as ITS4. Indeed, my thanks to John Viega, McGraw, and others for developing ITS4 and the corresponding ACSAC paper - I think ITS4 was an important step in getting people to use and develop static analysis tools for security. I also agree with McGraw that deeper analysis of programs is the way of the future. Tools that focus on simple lexical analysis (like ITS4, flawfinder, and RATS) have their uses, but there is much that they cannot do, which is why tool developers are now focusing on deeper analysis approaches. Static analysis tools that examine a program in more detail (like splint) can do much that simple tools (based on lexical analysis) cannot.

Most importantly, McGraw and I agree on the conclusion: “Static analysis for security should be applied regularly as part of any modern software development process.” We need to get that word out to where software is really developed. Static analysis not the only thing that people need to do, but it’s an important part. Yes, static analysis tools aren’t perfect. But static analysis tools can significantly help develop software that has real security.

path: /oss | Current Weblog | permanent link to this entry