Web-based applications (such as CGI scripts) run on some trusted server and must get their input data somehow through the web. Since the input data generally come from untrusted users, this input data must be validated. Indeed, this information may have actually come from an untrusted third party; see Section 7.16 for more information. For example, CGI scripts are passed this information through a standard set of environment variables and through standard input. The rest of this text will specifically discuss CGI, because it’s the most common technique for implementing dynamic web content, but the general issues are the same for most other dynamic web content techniques.
One additional complication is that many CGI inputs are provided in so-called “URL-encoded” format, that is, some values are written in the format %HH where HH is the hexadecimal code for that byte. You or your CGI library must handle these inputs correctly by URL-decoding the input and then checking if the resulting byte value is acceptable. You must correctly handle all values, including problematic values such as %00 (NIL) and %0A (newline). Don’t decode inputs more than once, or input such as “%2500” will be mishandled (the %25 would be translated to “%”, and the resulting “%00” would be erroneously translated to the NIL character).
CGI scripts are commonly attacked by including special characters in their inputs; see the comments above.
A brief discussion on input validation for those using Microsoft’s Active Server Pages (ASP) is available from Jerry Connolly at http://heap.nologin.net/aspsec.html