David A. Wheeler's Blog

Tue, 28 Mar 2006

Unsigned characters: The cheapest secure programming measure?

Practically every computer language has “gotchas” — constructs or combinations of constructs that software developers are likely to use incorrectly. Sadly, the C and C++ languages have an unusually large number of gotchas, and many of these gotchas tend to lead directly to dangerous security vulnerabilities. This forest of dangerous gotchas tends to make developing secure software in C or C++ more difficult than it needs to be. Still, C and C++ are two of the most widely-used languages in the world; there are many reasons people still choose them for new development, and there’s a lot of pre-existing code in those languages that is not going to be rewritten any time soon. So if you’re a software developer, it’s still a very good idea to learn how to develop secure software in C or C++… because you’ll probably need to do it.

Which brings me to the “-funsigned-char” compiler option of gcc, one of the cheapest secure programming available to developers using C or C++ (similar options are available for many other C and C++ compilers). If you’re writing secure programs in C or C++, you should use the “-funsigned-char” option of gcc (and its equivalent in other compilers) to help you write secure software. What is it, and what’s it for? Funny you should ask… here’s an answer!

Let’s start with the technical basics. The C programming language includes the “char” type, which is usually used to store an 8-bit character. Many internationalized programs encode text using UTF-8, so a user-visible character be stored in a sequence of “char” values. but even in internationalized programs text is often stored in a “char” type.

The C standard specifically says that char CAN be signed OR unsigned. (Don’t believe me? Go look at ISO/IEC 9899:1999, section 6.2.5, paragraph 15, second sentence. So there.) On many platforms (such as typical Linux distributions), the char type is signed. The problem is that software developers often incorrectly think that the char type is unsigned, or don’t understand the ramifications of signed characters. This misunderstanding is becoming more common over time, because many other C-like languages (like Java and C#) define their “char” type to be essentially unsigned or in a way that it wouldn’t matter. What’s worse, this misunderstanding can lead directly to security vulnerabilities.

All sorts of “weird” things can happen on systems with signed characters. For example, the character 0xFF will match as being “equal” to the integer -1, due to C/C++’s widening rules. And this can create security flaws in a hurry, because -1 is a common “sentinel” value that many developers presume “can’t happen” in a char. A well-known security flaw in Sendmail was caused by exactly this problem (see US-CERT #897604 and this posting by Michal Zalewski for more information).

Now, you could solve this by always using the unambiguous type “unsigned char” if that’s what you intended, and strictly speaking that’s what you should do. However, it’s very painful to change existing code to do this. And since many pre-existing libraries expect “pointer to char”, you can end up with tons of useless warning messages when you do that.

So what’s a simple solution here? A simple answer is to force the compiler to always make “char” an UNSIGNED char. A portable program should work when a char is unsigned, so this shouldn’t require any changes to that code. Since programmers often make the assumption, let’s make their assumption correct. In the widely-popular gcc compiler, this is done with the “-funsigned-char” option; many other C and C++ compilers have similar options. What’s neat is that you don’t have to modify a line of source code; you can just slip this option into your build system (e.g., add this option to your makefile). This is typically very trivial to do; typically you can just modify (or set) the CFLAGS variable to add this option, and then recompile.

I also have more controversial advice. Here it is: If you develop C or C++ compilers, or you’re a distributor who distributes a C/C++ compiler… make char unsigned by default on all platforms. And if you’re a customer, demand that from your vendor. This is just like similar efforts going on in operating system sales to users; today operating system vendors are changing their systems so that they are “secure by default”. At one time many vendors’ operating systems were delivered with all sorts of “convenient” options that made them easy to attack… but getting subverted all the time turned out to be rather inconvenient to users. In the same way, development tools’ defaults should try to prevent defects, or create an environment where defects are less likely. Signed characters are basically a vulnerability waiting to happen, portable programs shouldn’t depend on a particular choice, and non-portable software can turn on the “less secure” option when necessary. I doubt this advice will be taken, but I can suggest it!

Turning this option on does not save the universe; most vulnerabilities will not be caught by turning on this little option. In fact, by itself this is a very weak measure, simply because by itself this doesn’t counter most vulnerabilities. You need to know much more to write secure software; to learn more, see my free book on writing secure programs for Linux and Unix. But stick with me; I think this is a small example of a much larger concept, which I’ll call no sharp edges. Chain saws are powerful — and dangerous — but no one puts scissor blades next to the chain saw’s handle. We try to make sure that “obvious” ways of using tools are not dangerous, even if the tool itself can do dangerous things. Yet the “obvious” ways to use many languages turn out to lead directly to security vulnerabilities, and that needs to change. You can’t prevent all misuse — a chain saw can be always be misused — but you can at least make languages easy to use correctly and likely to do only what was intended (and nothing else).

We need to design languages, and select tools and tool options, to reduce the likelihood of a developer error becoming a security vulnerability. By combining compiler warning flags (like -Wall), defaults that are likely to avoid dangerous mistakes (like -funsigned-char), NoExec stacks, and many other approaches, we can greatly reduce the likelihood of a mistake turning into a security vulnerability. The most important security measure you can take in developing secure software is to be paranoid — and I still recommend paranoia. Still, it’s hard to be perfect all the time. Currently, a vast proportion of security vulnerabilities come from relatively trivial implementation errors, ones that are easy to miss. By combining a large number of approaches, each of which counter a specific common mistake, we can get rid of a vast number of today’s vulnerabilities. And getting rid of a vast number of today’s vulnerabilities is a very good idea.

path: /security | Current Weblog | permanent link to this entry