Chapter 6. Restrict Operations to Buffer Bounds (Avoid Buffer Overflow)

 

An enemy will overrun the land; he will pull down your strongholds and plunder your fortresses.

 Amos 3:11 (NIV)
Table of Contents
6.1. Dangers in C/C++
6.2. Library Solutions in C/C++
6.2.1. Standard C Library Solution
6.2.2. Static and Dynamically Allocated Buffers
6.2.3. strlcpy and strlcat
6.2.4. asprintf and vasprintf
6.2.5. libmib
6.2.6. Safestr library (Messier and Viega)
6.2.7. C++ std::string class
6.2.8. Libsafe
6.2.9. Other Libraries
6.3. Compilation Solutions in C/C++
6.4. Other Languages

Programs often use memory buffers to capture input and process data. In some cases (particularly in C or C++ programs) it may be possible to perform an operation, but either read from or write to a memory location that is outside of the intended boundary of the buffer. In many cases this can lead to an extremely serious security vulnerability. This is such a common problem that it has a CWE identifier, CWE-119. Exceeding buffer bounds is a problem with a program’s internal implementation, but it’s such a common and serious problem that I’ve placed this information in its own chapter.

There are many variations of a failure to restrict operations to buffer bounds. A subcategory of exceeding buffer bounds is a buffer overflow. The term buffer overflow has a number of varying definitions. For our purposes, a buffer overflow occurs if a program attempts to write more data in a buffer than it can hold or write into a memory area outside the boundaries of the buffer. A particularly common situation is writing character data beyond the end of a buffer (through copying or generation). A buffer overflow can occur when reading input from the user into a buffer, but it can also occur during other kinds of processing in a program. Buffer overflows are also called buffer overruns. This subcategory is such a common problem that it has its own CWE identifier, CWE-120.

Buffer overflows are an extremely common and dangerous security flaw, and in many cases a buffer overlow can lead immediately to an attacker having complete control over the vulnerable program. To give you an idea of how important this subject is, at the CERT, 9 of 13 advisories in 1998 and at least half of the 1999 advisories involved buffer overflows. An informal 1999 survey on Bugtraq found that approximately 2/3 of the respondents felt that buffer overflows were the leading cause of system security vulnerability (the remaining respondents identified “mis-configuration” as the leading cause) [Cowan 1999]. This is an old, well-known problem, yet it continues to resurface [McGraw 2000].

Attacks that exploit a buffer overflow vulnerability are often named depending on where the buffer is, e.g., a “stack smashing” attack attacks a buffer on the stack, while a “heap smashing” attack attacks a buffer on the heap (memory that is allocated by operators such as malloc and new). More details can be found from Aleph1 [1996], Mudge [1995], LSD [2001], or the Nathan P. Smith’s Stack Smashing Security Vulnerabilities website at http://destroy.net/machines/security/. A discussion of the problem and some ways to counter them is given by Crispin Cowan et al, 2000, at http://immunix.org/StackGuard/discex00.pdf. A discussion of the problem and some ways to counter them in Linux is given by Pierre-Alain Fayolle and Vincent Glaume at http://www.enseirb.fr/~glaume/indexen.html.

Allowing attackers to read data beyond a buffer boundary can also result in vulnerabilities, and this weakness has its own identifier (CWE-125). For example, the Heartbleed vulnerability was this kind of weakness. The Heartbleed vulnerability in OpenSSL allowed attackers to extract critically-important data such as private keys, and then use them (e.g., so they could impersonate trusted sites).

Figure 6-1. A physical buffer overflow: The Montparnasse derailment of 1895

Most high-level programming languages are essentially immune to exceeding buffer boundaries, either because they automatically resize arrays (this applies to most languages such as Perl), or because they normally detect and prevent buffer overflows (e.g., Ada95). However, the C language provides no protection against such problems, and C++ can be easily used in ways to cause this problem too. Assembly language and Forth also provide no protection, and some languages that normally include such protection (e.g., C#, Ada, and Pascal) can have this protection disabled (for performance reasons). Even if most of your program is written in another language, many library routines are written in C or C++, as well as “glue” code to call them, so other languages often don’t provide as complete a protection from buffer overflows as you'd like.