David A. Wheeler's Blog

Tue, 24 Mar 2009

Fixing Unix/Linux/POSIX Filenames

Traditionally, Unix/Linux/POSIX filenames can be almost any sequence of bytes, and their meaning is unassigned. The only real rules are that “/” is always the directory separator, and that filenames can’t contain byte 0 (because this is the terminator). Although this is flexible, this creates many unnecessary problems. In particular, this lack of limitations makes it unnecessarily difficult to write correct programs (enabling many security flaws), makes it impossible to consistently and accurately display filenames, and it confuses users.

So for those of you who understand Unix/Linux/POSIX, I’ve just released a new technical article, Fixing Unix/Linux/POSIX Filenames.

This article will try to convince you that adding some limitations on legal Unix/Linux/POSIX filenames would be an improvement. Many programs already presume these limitations, the POSIX standard already permits such limitations, and many Unix/Linux filesystems already embed such limitations - so it’d be better to make these (reasonable) assumptions true in the first place. The article discusses, in particular, the problems of control characters in filenames, leading dashes in filenames, the lack of a standard encoding scheme (vs. UTF-8), and special metacharacters in filenames. Spaces in filenames are probably hopeless in general, but resolving some of the other issues will simplify their handling too. This article will then briefly discuss some methods for solving this long-term, though that’s not easy - if I’ve convinced you that this needs improving, I’d like your help figuring out how to do it!

So - take a peek at Fixing Unix/Linux/POSIX Filenames. If you have ideas on how to help, I’d love to know.

path: /oss | Current Weblog | permanent link to this entry