Software development is often supported by specialized programs called "Software Configuration Management" (SCM) tools. SCM tools often control can read and modify the source code of a program, keep history information (so that people can find out what changed between versions, and who changed them), and generally help developers work together to improve a program under development.
Problem is, the people who develop SCM tools often don't think about what kind of security requirements they need to support. This mini-paper describes briefly the kinds of security requirements an SCM tool should support. Not every project may need everything, but it's easy to not notice some important requirements if you don't think about them. There are two basic types of SCM tools, "centralized" and "distributed"; the basic security needs are the same, but how these needs can be handled are different between the two different types. I'm primarily concentrating on basic SCM tools (like CVS, Subversion, GNU Arch, Bitkeeper, Perforce, and so on). Clearly related tools include build tools, automated (regression) test tools, bug tracking tools, static analysis tools, process automation tools, software development tools (such as editors, compilers, and IDEs), and so on.
Fundamentally, there are some basic (potential) security requirements that any system needs to consider. These are:
An SCM has several assets to protect. It needs to protect "current" versions of software, but it must do much more. It needs to make sure that it can recall any previous version of software, correctly, as well as the audit trail of exactly who made which change and when. In particular, an SCM has to keep the history immutable - once a change is made, it needs to stay recorded. You can undo the change, but the undoing needs to be recorded separately. Very old history may need to be removed and archived, but that's different than simply allowing history to be deleted.
Okay, so what are the potential threats? These vary, and not all projects will worry about all threats. Nevertheless, it's easier to provide a list of threats and the counter-measures an SCM should support.
Individual projects may choose to not employ a given counter-measure, since they may decide that's not a threat for them. For example, open source software (OSS) projects may decide that there's no "threat" of unauthorized reading of software, since the code is open to reading by all. However, that may not always be true - many OSS projects hide changes that reveal security vulnerabilities until the new version is ready for deployment. Thus, it's difficult to make simple statements like "projects of type X never need to worry about threat Y". Instead, it's simpler to list some potential threats, and then projects can decide which ones apply to them (and configure their SCM system to counter them).
An outsider (not a developer or administrator) may try to read or modify assets (software source code or history information) when they're not authorized to do so. SCM systems should support authorization (like login systems), and support a definition of what unauthorized users can do. An SCM system should support configurations that allow anonymous reading of a project and/or its history, since there are many cases where that's useful. However, SCMs should also support forbidding anonymous read access. That's even true for OSS projects, since as I noted above, sometimes OSS projects want to hide security fixes until they're ready for deployment.
Normally unauthorized users shouldn't be allowed to modify a source repository, so an SCM should support that (and should make that the default). In rare cases, it's possible to imagine that even this constraint isn't true, especially if the SCM tool is designed to be used for resources other than source code. Most Wiki systems such as Wikipedia allow anonymous changes; they work instead by protecting the history of changes so that everyone will know exactly what's changed, instead of preventing writing of the primary data. Such approaches are rare for software code; for example, the Wikipedia software itself (as stored in its trusted repository) can only be changed by a few privileged developers. However, it is conceivable that software documentation and code would be maintained by the same SCM software, and perhaps a few projects would allow anyone to update the documentation as long as all changes were tracked and could be easily reversed.
The underlying identification and authentication system (the login system) can use intrusion detection systems to detect likely attempts to forge privileges (e.g., by detecting password guessing attacks, or detecting improbable locations of a login). The underlying login system could also support enabling limits (e.g., delays after X login attempts, or only permitting logins from certain Internet Protocol address ranges for certain developers). However, these mechanisms need to not create a denial-of-service attack; otherwise, an attacker might try to forge logins not to actually log in, but to prevent legitimate users from doing so.
An SCM system should support protected logins (e.g., if it uses passwords, it should protect passwords during transit and while they're stored). Once users are authenticated, an SCM system should be able to limit what users can do based on the authorization that's implied.
SCM systems could usefully limit reading to particular projects, say. Limiting reading of specific files inside a project can be useful, but it often isn't as useful inside a branch developers must access because developers often need the entire set of files to develop (e.g., to recompile something). But limiting who can read changes in certain branches could be vital for some projects. For example, it is common for security vulnerabilities to be reported to a smaller group of people than the entire development staff, and for the patch to be developed by specially trusted developers without full knowledge of all developers. This is particularly true for open source software projects, but it's also sometimes true for other projects. This kind of functionality can also be important for projects such as military projects with varying degrees of confidentiality; most of the program may be "unclassified", but with a poor or stubbed algorithm; there may be a better classified algorithm, but it will need to be maintained separately. Ideally, the SCM should be trustworthy enough to protect that data, though in practice such trust is rarely granted; an SCM should instead gracefully handle importing the "unclassified" version and automatically merging the "classified" data on equipment trusted to do so.
Limiting writing of specific files inside a project can be much more useful, since in some projects some users "own" certain files. In many situations it doesn't make sense either, but an SCM system should still support limiting which developers can make which changes.
An area often forgotten by SCM systems is handling malicious developers. You know, the ones who intentionally insert Trojan horses into programs. Denying they exist doesn't help; they do exist. And even if they didn't, there's no easy way for an SCM to tell the difference between an authorized malicious developer and an attacker who's acquired an authorized developer's credentials.
A malicious developer might even try to make it appear that some other developer has done a malicious deed (or at least make it untraceable). They can use their existing privileges to try to gain more privileges. A malicious developer might try to modify the data used by a CM system so that it looks like someone else made the change (e.g., provide someone else's name in a ChangeLog entry). A malicious developer might try to modify a CM "hook" to make it appear that some other developer has inserted malicious code (perhaps to avoid blame or frame the other developer). A malicious developer might modify the build process, e.g., so that when another developer builds the software, the build system attempts to steal credentials or harm the developer.
Since developers have the privileges to read and change data, malicious developers (and attackers with their credentials) are harder to counter. But there are counter-measures that can be used against them. Here are some reasonable measures:
On April 11, 2004, Dr. Carsten Bormann from the University of Bremen sent me an email about a specialized attack that he terms the "encumbrance pollution attack". In an encumberance pollution attack, the attacker inserts material that cannot be legally included. To understand it, first imagine an SCM with perfectly indestructible history. The attacker steals developer credentials, or is himself a malicious developer, and checks in a change that contains some encumbered material. "Encumbered" material is simply material which cannot be legally included. Examples include child pornography, slanderous/libelous statements, or code which has copyright or patent encumberances. This could be very advantageous, for example, a company might hire a malicious developer to insert that company's code into a competing product, and then sue the competitor for copyright infringement, knowing that their SCM system "can't" undo the problem. Or a lazy programmer might copy code that they have no right to copy (this is rare in open source software projects, because every line of code and who provided it is a matter of public record, but it proprietary projects do have this risk). Any SCM can record a change that essentially undoes a previous change, but if the history is indestructable and viewable by all, then you can't get rid of the history. This makes your SCM archive irrevocably encumbered. This can especially be a problem if the SCM is indestructably recording proposals by outsiders! An SCM system could be designed so that a special privilege allowed someone to completely deletion the history data of illegal changes, of course. However, if there are special privileges to delete history data, it might be possible to misuse those privileges to cause other problems.
One mechanism for dealing with an encumberance pollution attack is to allow specially-privileged accounts to "mask" history elements; i.e., preventing access to certain material by normal developers so that it's no longer available, so that the material isn't included in later versions (essentially it work like an "undo" against that change). However, a "mask" would still record the event in some way so that it would be possible to prove that the event occurred at a later time. Perhaps the system could record a hash of the encumbered change, allowing the encumbered material to be removed from the normal repository yet proving that, at one time, the material was included. A "masking" should include a cryptographic signature of whoever did the masking. This mechanism in particular requires careful design, because the mechanism should be design so that it doesn't permit other attacks.
Most SCM systems have multiple components, say, a client and server. Even GNU arch, which can use a simple secure ftp server as a shared repository, has a possible server (the ftp server). Clients and servers should resist attack from other potentially subverted components, including loss of SCM data.
Many repositories have themselves undergone attack, including the Linux CVS mirror, Savannah, Debian, and Microsoft (attackers have acquired, at least twice, significant portions of Windows' code). In 2011, kernel.org was subverted. Thus, a good SCM should be able to resist attack, even when the repository it's running on subverted (through malicious administrators of a repository, attacker root control over a repository, and so on). This isn't just limited to centralized SCM systems; distributed SCM systems still have the problem that an attacker may take over the system used to distribute someone's changes. In 2011, attackers subverted Linux's kernel.org site, but it's believed there was little damage to the source code repositories due to the nature of git.
An SCM should be able to prevent read access, even if the repository is attacked. The obvious way to do this is by using encrypted archives. But there are many variations on this theme, primarily in where the key(s) are stored for decryption. If the real problem is just to make sure that backup media or transfer disks aren't easily read, the key could simply be stored on a separate (more protected) media. The archive keys might only be stored in RAM, and required on bootup; this is more annoying for bootup, and an attacker is likely to be able to acquire the data anyway. The repository might not normally have the keys necessary to decrypt the archive contents at all; it could require the developer to provide those keys, which it uses and then destroys. This is harder to attack, but a determined adversary could subvert the repository program (or memory) and get the key. Another alternative is to arrange for the repository to not have the keys necessary to decrypt the archive contents at any time. In this case, developers must somehow be provided with the keys necessary to do the decryption, and essentially the repository doesn't really "know" the contents of the files it's managing!
Preventing write access when an attacker controls a repository is a difficult challenge, especially since you still want to permit legitimate changes by normal developers. Since the attacker can modify arbitrary files in this case, the goal is to be able to quickly detect any such changes:
There only seems to be a little related work available on the topic. Lynzi Ziegenhagen wrote a Master's thesis for the Naval Postgraduate School about revision control tools for "high assurance" (a.k.a. secure) software development projects: Evaluating Configuration Management Tools for High Assurance Software Development Projects ( also available at StormingMedia). A commentary on that paper is also available. The OpenCM project has published some papers, including Jonathan S. Shapiro and John Vanderburgh's Access and Integrity Control in a Public-Access, High-Assurance Configuration Management System (Proc. 11th USENIX Security Symposium, 2002, San Francisco, CA, 2002).
Another related paper is "Configuration Management Evaluation Guidance for High Robustness Systems" by Michael E. Gross (Lieutenant, United States Navy), March 2004.
The Trusted Software Methodology included a number of configuration management requirements; in particular, its upper level requirements were specifically designed to counter malicious developers. See "Trusted Software Methodology Report" (TSM), CDRL A075, July 2, 1993, and in particular its appendix A (which defines the trust principles). The Common Criteria includes a number of configuration management requirements (see in particular part 3 in the ACM section).
Security for Automated, Distributed Configuration Management by Devanbu, Gertz, and Stubblebine examine a completely different problem (one which is important, but not the one in view here).
Jeronimo Pellegrini's Apso (prototype software) is a framework for adding secrecy to version control systems (currently for Monotone). In 2006 a conference paper was published that described it. In 2011 he mentioned that he hasn't had much time to work on it any more, but the ideas may be of interest to others.
There is a vast amount of literature about SCM systems, as well as papers discussing or evaluating particular systems. That includes my own Comments on OSS/FS Software Configuration Management (SCM) Systems.
All of this can't prevent all attacks. But such an SCM system can make the attacks much harder to perform, more likely to be detected, and make detection much more rapid. Here are some examples:
It's my hope that SCM systems will have more of these capabilities in the future. I'm happy to note that some SCM developers have considered these issues. Aegis has a nice side-by-side comparison comparing a version of this paper with Aegis' capabilities. Bazaar-NG has considered these security ideas. Hopefully others will consider these issues too.
Feel free to see my home page at http://www.dwheeler.com. Paul Stadig's "Thou Shalt Not Lie: git rebase, ammend, squash, and other lies" is related.