David A. Wheeler’s Page on Countering Trusting Trust through Diverse Double-Compiling (Trojan Horse attacks on Compilers)

Here’s information about my work to counter the “Trusting Trust” attack. The “Trusting Trust” attack is an incredibly nasty attack in computer security; up to now it’s been presumed to be the essential uncounterable attack. I’ve worried about it for a long time, essentially since Thompson made his report. If there’s a known attack that cannot be effectively countered, even in theory, should we really be using computers at all? My hope is that my work aids the field.

Main Paper

Here’s my key paper, which was formally reviewed and published:

Countering Trusting Trust through Diverse Double-Compiling (DDC), David A. Wheeler, Proceedings of the Twenty-First Annual Computer Security Applications Conference (ACSAC), December 5-9, 2005, Tucson, Arizona, pp. 28-40, Los Alamitos: IEEE Computer Society. ISBN 0-7695-2461-3, ISSN 1063-9527, IEEE Computer Society Order Number P2461. If you cannot get that paper from ACSAC, here's a local copy of Countering Trusting Trust through Diverse Double-Compiling (DDC) as posted by ACSAC. You can also get this alternative PDF of "Countering Trusting Trust through Diverse Double-Compiling (DDC)" and OpenDocument form of "Countering Trusting Trust through Diverse Double-Compiling (DDC)". (I have the rights to publish it here as well.)

Here’s the abstract of that paper:

An Air Force evaluation of Multics, and Ken Thompson’s famous Turing award lecture “Reflections on Trusting Trust,” showed that compilers can be subverted to insert malicious Trojan horses into critical software, including themselves. If this attack goes undetected, even complete analysis of a system’s source code will not find the malicious code that is running, and methods for detecting this particular attack are not widely known. This paper describes a practical technique, termed diverse double-compiling (DDC), that detects this attack and some compiler defects as well. Simply recompile the source code twice: once with a second (trusted) compiler, and again using the result of the first compilation. If the result is bit-for-bit identical with the untrusted binary, then the source code accurately represents the binary. This technique has been mentioned informally, but its issues and ramifications have not been identified or discussed in a peer-reviewed work, nor has a public demonstration been made. This paper describes the technique, justifies it, describes how to overcome practical challenges, and demonstrates it.

I’m honored to have been accepted by the ACSAC 2005 conference. They get lots of good submissions, yet this year they rejected 77% of their submitted papers. One reason that I submitted to ACSAC is that I believe publication on the web is absolutely critical for widespread use of a result; ACSAC has been publishing on the web for a long time now, and is an open access conference.

Presentation

I have a presentation based on the paper. I gave the original presentation at ACSAC; I've since updated it a little based on various feedback I've received.

You can get the presentation in:

Typos!

The ACSAC 2005 paper "Countering Trusting Trust through Diverse Double-Compiling" has a typo. In the last paragraph of section 4, just ahead of the figure, it says: "if c(sA,c(sA,T)), A, and c(sA,T) are identical, ...". The "c(sA,T)" should be "c(sA,A)"; you can confirm this because the figure clearly shows "c(sA,A)" not "c(sA,T)". My thanks to Ulf Dittmer for pointing this out to me!

Citing My Work (it's David A. Wheeler, please)

If you cite my work, at least include my middle initial “A.”, and if at all possible please use “David A. Wheeler”. Please do not cite me as “David Wheeler” or “D. Wheeler” in any written work (including electronic media like the Internet). There are too many of them, so it’s like not giving me credit at all. If you are required by forces outside your control to use initials, at least use “D. A. Wheeler”. However, I would really appreciate it if you showed me the courtesy of using my actual name (as I use it), instead changing it. In general, please cite the names that people actually use; please don't change them into someone else's name. Thanks.

Detailed data to duplicate the ACSAC experiment

In any scientific work it’s important to be able to duplicate the work. See my Tiny C Compiler (tcc) page for how to duplicate the ACSAC experiment, as well as other tcc-related work too.

Countering Misconceptions

One misconception about this paper seems to be hard to shake. I say it over and over again in the paper, but somehow it does not sink in, so let me try again.

The DDC approach does not assume that two completely different compilers (A and T) will produce the same binary output, given the same input. That is not what the paper says, and it specifically says that this is not one of the assumptions. In fact, as noted in the paper, it's an improvement if A and T generate code for different CPU architectures (say, M68000 and 80x86). Clearly, if they're generating code for different CPUs, the binary output of the two compilers cannot always be identical!

This approach does require that compiler T be able to compile the source code of compiler A. So if compiler A is written in Java, you can't use a C compiler to compile it (not directly, anyway). The tricky part is that if compiler A's source code uses language extensions, then compiler T has to handle those language extensions or you have to create a process to handle them (e.g., via a preprocessor -- if you do that, "compiler T" will really be your preprocessor plus your "original" compiler T).

It also requires that compiler A, as defined by its source code, must be deterministic. But that's different. A compiler is deterministic if, when run twice on identical input (with all the same option flags, etc.), it produces the same output. You can use a random number generator, as long as you give the user control over the random number generator seed (gcc, for example, has a command line option for setting the seed). For example, on a Unix/Linux system, you should be able to do this:

  $ mycompiler input.c     # Compile, store result in "a.out".
  $ mv a.out a.out.saved   # Save old result.
  $ mycompiler input.c     # Do it again
  $   # If a.out and a.out.saved are always identical (except internal
  $   # timestamps, if any) then the compiler is deterministic.

Compilers generally do meet that requirement, with the possible exception of embedded timestamps -- and I discuss how to handle embedded timestamps in the paper. In fact, some compilers (like gcc) include this as a self-test.

Note that this has nothing to do with non-determinism of the underlying CPU. The CPU can have all sorts of non-deterministic actions with multiple cores and so on. But if the CPU were so non-deterministic that you could not write data in a particular order, you couldn't get a compiler or any other program to run.

Compiler T doesn't need to be deterministic.

For the pedants: Yes, sometimes it's possible to write machine code that runs on multiple yet radically different CPU architectures, depending on the architectures. You may even be able to devise code that determines which architecture it's running on, and then jumps to the "right" code for that architecture. These would exploit the exact values of various machine codes, and are certainly clever hacks. But if you want to do that, fat binaries with multiple segments (each for a different architecture) are a better approach -- they're designed to do that cleanly. In any case, that's not the point -- the point is that A and T are not required to generate identical code as output.

Other Questions

Some people have sent in questions, here are some answers. Also, please read the paper -- it anticipates and answers many questions.

  1. What about interpreters/scripting languages, like Perl, Python, TCL, Ruby, etc.?

    If the interpreter always takes source code and re-interprets it at run-time, each time, there's no problem of hidden binaries at that level. Scripting language implementations are generally written using a combination of a lower-level language (say C) to implement the lowest-level interpreter / bytecode engine, combined with a higher-level language (typically themselves). DDC just tells you that source and binary match. For code in the scripting language, you can already see the code, so there's no problem of "do the source and binary match" -- just regenerate it. For the lower-level interpreter, you can see its source code. So the only area in these cases where there is a mysterious, potentially subverted executable, is the lower-level engine -- and that was generated by the lower-level language compiler. So you don't need DDC for the upper levels in those cases; DDC would be applied to the lower-level compiler. You still need to review the source code, of course.

  2. What about compilers that don't compile themselves?

    Compilers that don't compile themselves can already be handled with traditional mechanisms. If compiler A is compiled by compiler B, and if compiler B is trustworthy, just recompile compiler A and see if compiler A reappears. If compiler B is not trustworthy, but was compiled by compiler C, then repeat that... and so on.

    The problem is that sooner or later you hit a "bottom"; you have a compiler that compiles itself, or a loop of compilers that compile themselves, or a chain going back endlessly to the dawn of time that you can't possibly check (and so practically you've hit the bottom). This is the "trusting trust" problem; sooner or later you hit a bottom, and there seems to be no way to address trust at that point. This is where DDC comes in; it breaks the chain of trust that appears unending so that you can once again determine that the binary really is generated from the given source.

  3. What if the compiler isn't self-generating, but is always generated from a previous version?

    Just repeat the steps in DDC in a manner analogous to how the compiler was created. This means you'll have two different source code sets as input... which means you'll have more source code to check, but it's otherwise fine. This is still DDC, just a trivial variation on the process. I discuss this further, below.

  4. Why not just always use the trusted compiler T? Compiler T is only used for checking compiler A; there may be a lot of reasons compiler T isn't suitable for general use. For example, Compiler T may be slow, produce slow code, generate code for a different CPU architecture, lack functions you need (the trusted compiler only needs to be able to compile the other compiler), or have a software license you don't like.

Obvious Extension: Compiler isn't Self-Compiled

Some have noted that the compiler doesn't have to be self-compiled. That's true; the paper doesn't delve into the details of that case, but it's an obvious extension. You can still apply DDC, you just have to account for the additional compiler involved used to generate A... I'll call this other compiler binary P, for which you'll need the source code.

So, let's define some terms:

The DDC stages get very slightly modified, as follows:

In the original paper, A is also its parent P.

The assumptions get slightly modified, but by very little. All the original assumptions hold, but the source code that the result depends on (termed "s") is now not just sA, but sA union sP. If there are few differences between sP and sA, this may be easily performed by examining the "diff" between them, and then examining one in depth; otherwise, you'll have to examine both if you're worried that the compiler source code might contain mischief. The process depends on all libraries used in the entire DDC process; that was true before, but if P and A are significantly different that set may become much bigger than if P and A are the same thing. Note that trusted compiler T must be able to compile sP, and of course the parent compilers (P and P1) must be able to compile the source code sA.

I originally thought this extension was fairly obvious, although it wasn't mentioned directly in the paper; inside my company I'd already been doing this sort of thing. I had originally been interested in just the self-regenerated case, I thought adding this would make it even more complicated, and I was already exceeding the page limit (it's a 13-page paper, the limit was 10). Some existing compilers (such as gcc) use the self-regeneration check to test themselves, so the self-regenerated case is quite common. When you use self-regeneration, you eliminate a lot of dependencies on the parent (P), which is a good thing -- it reduces what you have to review, in particular. Thus, I chose to not delve into this more general variation. In retrospect, I should have included this information, because it shows that the DDC approach is actually more general than shown in the paper. Ah well, sometimes the decisions you make are wrong in retrospect. My thanks to Ben Laurie for convincing me that this should be discussed more directly... and so here it is!

Note that this is how you can break a loop of compilers that mutually depend on each other for self-regeneration. In this case, you use T to "break" the loop. Note that P doesn't need to be a radically different compiler; it might just be an older version of A. Libraries can be handled by considering them as part of the compiler (if they're run) or part of the source (if they're used as input data that's not run). Indeed, P and A might be based on the same compiler, but use different versions of libraries; the effect is the same.

What about Applying this to Hardware?

I mentioned applying this approach to hardware in the paper, and discussed that idea in more detail at the ACSAC conference. Obviously, if your software is okay, but the hardware is subverted, you're still subverted. I think this approach can apply to hardware too, but let's discuss what that means first.

First, what some people call "hardware" is actually software. BIOS files and microcode are still software, and you can still handle them the same way.

Generally, when people worry about hardware subversion, people are worried about the more obvious subversion: a CPU chip or support chip could be directly modified to do something bad (like allow unknown remote control or include a shutoff date). In my mind, it's more likely that either the chip will have that placed in its design by a human, or that the attack will be sneakily put into its implementation during the manufacturing process. You might manipulate a tool (say, its software or configuration) or manipulate a tool's results (like a mask or the physical chip). There are technical ways of countering this:

  1. If you're worried about subversion at the design level, then you need to find a way to review the designs (and make sure that what you review is what was used).
  2. If a software tool inserts the subversion, you need to look at the software tool's source code. If you think its binary is evil but the source is clean, just recompile it and check... we're still in the software domain, and it's not self-generating if it's a direct attack like this. (See below for the self-regenerating case, where DDC reappears.)
  3. If a tool output (e.g., the chip itself or a mask) has been subverted, if the tool can be made to be deterministic, in theory you should be able to rerun that tool and then compare the "expected" results with the "actual" results. If you're worried about all steps, you can rerun each step in sequence and see if there's a difference. You'll need an "equality" operator for masks and chips, which is much harder; see below where I discuss that.

Okay, so there's a technical method for detecting and countering direct subversion of hardware. There's a sneakier, though in my mind a far less likely threat: what if the hardware has been subverted so that the hardware subverts the hardware development process of the next generation? At the software level I think this is likely -- it's already been done at least once, and it's easy to do, because the abstraction levels typically match. At the hardware level, hardware subversion of its own development process is much harder to do because hardware is at such a different level of abstraction -- it's very difficult to create useful automated triggers and payloads in hardware that trigger on the hardware design process. Even if you manually trigger, it's tricky to create the right payloads. And in any case, there are so many other kinds of hard-to-counter attacks at the hardware level that I don't see hardware self-subversion as very likely; why bother being fancy when the direct attack would work?

But let's put on our triple-layer tinfoil hats, and say that we want to counter hardware self-subversion. Hardware self-subversion is where the hardware is designed to subvert the design of the next generation of the hardware.

Although there's no conclusive proof, I think the DDC approach could counter hardware-level self-subversion. (I have a B.S. in Electronics Engineering, so I can speak about this topic with some authority!) There's a technical challenge to overcome, but the biggest problem is that you may not be allowed to get the information you need.

First, the technical challenge. For this to work on hardware, you need an "equality" operator. I believe a scanning electron microscope, with varying angles/positions, could be coaxed into doing the job of determining if a chip was "equal to" another chip (real or virtual). That's especially true if it were supplemented with other test techniques. This would only show (with high probability) that one PARTICULAR chip is okay, of course; another chip might be malicious. That's true for the software approach too, it's just that checking for "equality" is a lot easier for software than for hardware.

But the REAL problem is that huge amounts of hardware data, including the ACTUAL layout of the chip, is usually kept proprietary from even the chip designers. You have to know what the CORRECT hardware result is before a comparison makes sense. In the software world, software developers would not find it acceptable if they couldn't at least SEE the bytes that their compilers output. In contrast, hardware chips are routinely modified in the many manufacturing steps in ways not disclosed to the chip designers.

Engineers who use Verilog or VHDL and think that they know what's actually on their chips are in for a rude shock. The libraries that the tools show are NOT what are really used on chips, for example. And because of quantum mechanical effects, at smaller scales there are "corrections" that some companies will do to your chip's layouts/wiring that you're forbidden (by contract!) to see. In most of the world the chip designers aren't anywhere near the foundaries, and this separation is misleading many chip designers into thinking that they know what's on their chips. (I find this complete separation very disturbing and dangerous.) And that's if you designed the chip; today most people buy IP Cores from random organizations worldwide. In that case they have NO idea exactly what's supposed to be on what they buy, nor any way to find out.

In fact, this lack of information is the real problem. If you can't get this kind of information, an attacker doesn't need to implement a hardware-subverting-hardware attack. Instead, they just insert the attack. If we're to trust future hardware, we need to find a way to get the information we need so that we can trust it.

But hey, one impossible problem at a time, okay :-)?

Credit Where Credit is Due

As I clearly note in the paper, I didn’t come up with the original idea. The original idea was dreamed up by the amazingly bright Henry Spencer (a few steps in his original description weren’t needed, but he had the original idea). However, he never pursued it; in fact over time he’d forgotten about it. I took his few sentences describing his idea and greatly expanded on it, including a much more detailed and analyzed description of it, as well as justifying and demonstrating it.

Who's Talking about it?

The first syllabus that included my ACSAC 2005 paper as required reading is CSC 593: Secure Software Engineering Seminar, a Spring 2006 class taught by Dr. James Walden at Northern Kentucky University. He paired my paper with Ken Thompson's classic 1984 paper Reflections on Trusting Trust. It was also a subject of a class session at George Mason University (GMU)'s "Advanced Topics in Computer Security: Cyber-Identity, Authority and Trust" (IT962) taught by Ravi Sandhu. I had the honor of visiting for the day and giving the presentation myself for their Spring 2006 session.

The paper has been noted or discussed at many locations, including Bugtraq, comp.risks (the Risks digest), Bruce Schneier's weblog (the source for Crypto-Gram), Lambda the ultimate, SC-L (the Secure Coding mailing list), LinuxSecurity.com, Chi Publishing's Information Security Bulletin, Wikipedia's "Backdoor" article, Open Web Application Security Project (OWASP) (mailing list), and others.

Bruce Schneier's page in particular includes a lengthy commentary about it, and both his site and Lamba-the-Ultimate have various blog entries. The article Open Source is Securable discusses the paper and its ramifications -- in particular, it's finally possible to make very strong claims through source code analysis.

Some Related Material

I've since learned that Wolfgang Goerigk re-illustrated the attack using ACL2. Here's the summary: "Compiler Verification Revisited (Wolfgang Goerigk) This study illustrates a fact observed by Ken Thompson in his Turing Award Lecture: the machine code of a correct compiler can be altered to contain a Trojan Horse so that the compiler passes almost every test, including the so-called ``bootstrap test'' in which it compiles its own source code with identical results, and still be capable of generating ``bad'' code. The compiler, the object code machine, and the experiments are formalized in ACL2." The sample code and ACL2 proofs are online.

You might also be interested in the results of the MITRE Vlisp project. Vlisp README says: "The Verified Programming Language Implementation project has developed a formally verified implementation of the Scheme programming language, called Vlisp... An overview of the project is presented in the Vlisp Guide. More accessible PDFs about Vlisp are available too. Another paper that you may find interesting is Jonathan A. Rees. "A Security Kernel Based on the Lambda-Calculus". PhD. Thesis. February 1995.

Coq is being used by Xavier Leroy (main developer of OCaml) to write a certified compiler, compcert, that guarantees that semantics of a C source program is kept up to PowerPC assembly. The *specification* (unfortunately not the Coq proofs) of the compiler back-end is available as GPL software.

Kegel's building and testing gcc/glibc cross toolchains has lots of good information.

The RepRap Project is developing inexpensive 3D printer designs that will hopefully (eventually) be able to create themselves. Very interesting, and in the future, possibly quite relevant.

Miscellaneous

I wrote this paper using OpenOffice.org, which did a fine job. This paper is dedicated to the memory of Dennis W. Fife, my former mentor.

I’m working to scale this up to gcc. I've worked for some time with Aaron Hatcher to do this (thanks Aaron!).

After doing endless numbers of tedious compiles, Xkcd's cartoon about compiling made me smile.

Mortality.pvs is a short demo of how to express the "All men are mortal" example using PVS.

Micro-Tainting

An aside: At ACSAC 2005, Aleks Kissinger (from the University of Tulsa) also presented work that he and I had done on micro-tainting. This was the presentation “Fine-Grained Taint Analysis using Regular Expressions,” which was part of the Works in Progress. Basically, we noted that instead of assigning “taint” to a whole value, such as a string, you could assign taint on subcomponents, such as each character. Then you could assign rules that identified the input paths and what could come in -- typically zero or more tainted characters -- and rules on output paths. We concentrated on defining regular expressions for what is legal, though any other expression for patterns such as BNFs would be fine too. We noted that you could then check statically or dynamically. For the static case, when you work backwards, if the check “fails” you can even trivially derive the input patterns that cause security failures (and from that information it should be easy to figure out how to fix it). Aleks has recently made some good progress by transforming the regular expressions into DFAs. There was another ACSAC presentation on doing taint analysis with Java, but this was the traditional “whole variable” approach that is used in many languages, but through which many vulnerabilities slip by. We hope this micro-tainting approach will lead to improved tools for detecting security vulnerabilities in software, before that software is delivered to end-users.

There is related work that we know about that has been going on in the University of Virginia (UVA), though we only found out about it halfway through our work (via Usenix). More information about the UVA work is in "Automatically Hardening Web Applications Using Precise Tainting" by Anh Nguyen-Tuong, Salvatore Guarnieri, Doug Greene, Jeff Shirley, and David Evans. They focus on PHP, and only on the dynamic case; we were interested in both, but especially interested in the static case (where you can show that certain vulnerabilities never occur and thus don't need any run-time overhead to deal with them).

Other related work includes the BRICS Java String Analyzer (GPL; uses the BSD-licensed dk.brics.automaton).

There is a long history of work on data flow, static typing, and security (such as work by Dennis Volpano et al). That's good work, but not really focused on what we were looking at. Those works tend to view variables as a whole, while instead we're tracking much smaller units of data. We're also tracking sequences (like arrays) which contain data with different levels of security; most such works handled arrays like a single unit (a simplification that is fundamentally at odds with our approach).


You can also view my book on writing secure programs, FlawFinder, or my home page.