Counting Source Lines of Code (SLOC)

Picture of David A. Wheeler

Click here to get the paper, ``More than a Gigabuck: Estimating GNU/Linux's Size,'' which presents my latest GNU/Linux size estimates, approach, and analysis.

My latest size-estimation paper is More than a Gigabuck: Estimating GNU/Linux's Size (June 2001). Here are a few interesting facts quoting from the paper (which measures Red Hat Linux 7.1):

  1. It would cost over $1 billion (a Gigabuck) to develop this Linux distribution by conventional proprietary means in the U.S. (in year 2000 U.S. dollars).
  2. It includes over 30 million physical source lines of code (SLOC).
  3. It would have required about 8,000 person-years of development time, as determined using the widely-used basic COCOMO model.
  4. Red Hat Linux 7.1 represents over a 60% increase in size, effort, and traditional development costs over Red Hat Linux 6.2 (which was released about one year earlier).

Many other interesting statistics emerge; here are a few:

You can get:

  1. ``More than a Gigabuck: Estimating GNU/Linux's Size'', my latest SLOC analysis paper which analyzes Red Hat Linux 7.1. You can also get some of the supporting information (intended for those who want to do further analysis), such as the complete summary, summary SLOC analysis of the Linux 2.4 kernel, map of build directories to RPM spec files, spec summaries, counts of files, and detailed file-by-file SLOC counts. You can also get version 1.0, version 1.01, version 1.02, version 1.03, version 1.04 or version 1.05 of the paper.

  2. ``Estimating Linux's Size,'' the previous paper which analyzes Red Hat Linux 6.2. Various background files and previous editions are also available. You can see the ChangeLog, along with older versions of the paper (original paper (version 1.0), version 1.01, version 1.02 and version 1.03). version 1.04). You can also see some of the summary data: SLOC sorted by size, filecounts, unsorted SLOC counts, unsorted SLOC counts with long lines, and SLOC counts formatted for computer processing (tab-separated data). For license information, you can see the licenses allocated to each build directory. If you want to know what a particular package does, you can find out briefly by looking at the package (specification file) descriptions.
  3. Linux Kernel 2.6: It's Worth More! does a deeper analysis of effort of just the Linux kernel.

When referring to this information, please refer to the URL http://www.dwheeler.com/sloc. Some of the other URLs may change, and I may add more measurements later.

If you want to get the tools I used, they're available. I call the set SLOCCount, and you can get SLOCCount at http://www.dwheeler.com/sloccount.

Here are some testamonials:

Others have been inspired by my paper More than a Gigabuck: Estimating GNU/Linux's Size to do more analysis, which is great:

  1. One group did an analysis of the Debian GNU/Linux distribution, using my tool sloccount. You can see their very interesting paper Counting Potatoes: The size of Debian 2.2 at http://people.debian.org/~jgb/debian-counting, or you can see an older version of it in Upgrade. They found that Debian 2.2 includes more than 55 million physical SLOC, and would have cost nearly $1.9 billion USD using over 14,000 person-years to develop using traditional proprietary techniques.
  2. In 2005 they measured Debian again, and reported results in Measuring Libre Software Using Debian 3.1 (Sarge) as A Case Study: Preliminary Results. Debian 3.1 ("Sarge") had grown to about 230 million source lines of code, with an estimated 60,000 person-years and $8 billion USD redevelopment cost. This was contained in 8,600 source packages, generating about 15,300 binary packages. Top languages were C (57%) C++ (16.8%), Shell (9%), LISP (3%), Perl (2.8%), Python (1.8%), Java (1.6%), FORTRAN (1.2%), PHP (0.93%), Pascal (0.62%), and Ada (0.61%). The largest programs (in order of size) were OpenOffice.org (1.1.3, mostly C++), the Linux kernel (2.6.8, mostly C), the web authoring system NVU (0.80, mostly C), internet suite Mozilla (1.7.7, mostly C++), compiler suite GCC (3.4.3, mostly C but significant amounts of Ada and C++), truetype font server XFS-XTT (1.4.1, mostly C), and XFree86 (4.3.0, mostly C).
  3. Another person analyzed Perl's CPAN library and determined it would have cost $677 million to develop; this CPAN analysis was a Slashdot article on July 30, 2004.

Comparitive numbers are hard to find. Gary McGraw (of Cigital) has searched public information to find Windows SLOC size. According to his sources, Windows NT 5.0 (in 2000) was 20M SLOC, Windows 2000 (in 2001) was 35M SLOC, and Windows XP (in 2002) was 40M SLOC. (This information is from his briefing Building Secure Software: How to avoid security problems the right way).

Palle Pedersen done a rough-order-of-magnitude analysis of all Free-libre / open source software, starting with some extremely simplifying assumptions. "Assuming an average open source project is 35,000 lines of code and the average cost of a software developer is $30/hour (~$60,000/year), a simple COCOMO II calculator tells us that the average open source project costs $630,000 to develop. This cost translates into $18 per line of code. Extrapolating that to 1.7 billion lines of code gives us an estimated value of $30.6 billion/year... if the open source community was a country with a GDP of $30.6 billion, it would rank 77 right between Bulgaria and Lithuania... putting the open source community ahead of most countries in the world... Such an economic force should not be underestimated, and this is yet another indication that open source has become a significant part [of] the technology world." The specific number may be significantly off, no one knows, but I think the conclusion (OSS has become a significant part) is spot-on.

Remember, there's more to a program than how many lines of code it has, as the August 26, 2003 Dilbert strip shows.

You can also view my home page (http://www.dwheeler.com), or related pages such as my pages on "Why open source software / free software (OSS/FS)? Look at the Numbers!", my open source software / free software references, and how to write secure programs.

This site is hosted by Webframe.org.