Automating DESTDIR for Packaging

by David A. Wheeler

2009-02-19 (updated 2011-07-29)

It is unnecessarily hard to create native packages (like deb and RPM), and unnecessarily hard to directly install source code packages under the control of programs like GNU stow, because many software source packages fail to support the DESTDIR convention. The DESTDIR convention makes it easy to compile a program so that it will run in some directory X, but be installed in directory $DESTDIR/X instead. There are a vast number of source packages that do not support DESTDIR, and it’s often difficult to add DESTDIR support to complex makefiles. This paper discusses what could be done to automatically support DESTDIR, instead of requiring every source package in the universe be changed to support DESTDIR. This paper shows that there are practical ways to automate support for DESTDIR, and points to tools like auto-DESTDIR and user-union that implement some of these solutions.

Introduction

Today’s users of Linux and Unix systems don’t want to follow complicated instructions to install programs — they want to click on one button, and have everything installed as necessary. Ideally, they should be able to install programs using their native package format and download tools, such as deb (used by Debian and Ubuntu) and RPM (used by Fedora, Red Hat, and SuSE/Novell). But that means that someone has to create those packages. Alternatively, they should be able to use a program that can automatically download, compile, and install from source code, perhaps with some program like GNU stow that can place each package in its own separate directory while appearing to all be in one place. Whether you are creating native packages, or automatically installing source packages, it’s often vital to be able to compile the program so that it will run in some directory X, but install the program in some other directory $DESTDIR/Y. There is a standard way to do this: Have the source package support the DESTDIR convention.

Unfortunately, many software source packages fail to support the DESTDIR convention, and it’s sometimes a real pain to add DESTDIR support. The build programs (e.g., Makefiles) can be large, complex, and multilayered... so it can be painful for a packager to modify the build scripts to add DESTDIR support. A packager can send DESTDIR patches upstream, but they may be ignored or improperly maintained... which means that the packager may need to keep re-modifying the build system, every time the program is updated. Ugh. All too often, packaging can be completely automated except for the lack of DESTDIR support.

Why is DESTDIR important? There are many reasons, but let's look at two examples:

DESTDIR helps create native packages. The tools for creating native packages in deb and rpm formats, (the two most popular Linux distribution foramts) require that “installed” files be specially-placed in a subdirectory by a “make install” during package creation. This is something DESTDIR enables. For example, Debian’s documentation explains that during the packaging process, you must “install the program into a temporary subdirectory from which the maintainer tools will build a working... package. Everything that is contained in this directory will be installed on a user’s system when they install your package...”. Fedora’s Creating Package HowTo has similar requirements for Fedora. The packaging software then copies files from that intermediate location into an archive for later installation in the “right” place. It’s easy to specially place files-to-be-installed if the program already supports the “DESTDIR” variable, because DESTDIR tells the installer the intermediate location to install software. Otherwise, it can be difficult to do.
DESTDIR helps install local packages from source. Many people use programs like GNU stow (or similar conventions) to help manage locally-installed packages from source code. For example, GNU stow is designed to let you store a program into some directory like /usr/local/stow/MYPROGRAM and have binaries in /usr/local/stow/MYPROGRAM/bin/myprogram, yet have the program be invoked as /usr/local/bin/myprogram. That way, plug-ins and extensions will automatically work correctly. GNU stow’s documentation specifically notes that you need to do this. GNU stow's documentation suggests using make prefix=Y install as a work-around, but as they note, many programs (including emacs!) automatically force a recompilation when the prefix is changed, making this moot. It can also cause subtle problems when installing; it makes more sense to have a separate prefix and DESTDIR value, so that each can be used where appropriate.

It’d be much better if the equivalent of DESTDIR could be automated, without requiring application programs to add DESTDIR support to their installers. After all, almost any “real” Unix/Linux program with source code available supports “make install” (or equivalent) for installation. A “make install” process presumes that it is writing the files to the “real” filesystem. Ideally, there’d be a way to reroute writes to the “real” filesystem to some other directory tree, so that they could be packaged or used with programs like GNU stow. This shouldn’t be that hard - “make install” may invoke many programs and do a lot of recursive directory descending to figure out what to install, but the commands that actually do the installing are usually simple ones like “install” and “cp”.

Ideally this re-routing process would work without requiring the program running “make install” to run as root. That way, a non-root user could do “make-redir DESTDIR=newdir install” (or equivalent) and have all the “installed” files show up inside newdir. Ideally, it would be efficient, and could track other information (such as permissions and owners). Also, these should be able to work without re-routing programs that don’t descend from “make install”; often final packaging is done on a shared machine that is packaging multiple programs simultaneously. A lot of tools don’t quite do this; they primarily just ‘track’ what’s changed, require special privileges, and so on. Some tools that you would think do this can’t do anything remotely like it; for example, “fakeroot” (widely used by Debian) can record owners, but it can’t redirect writes to files (because it doesn’t wrap the system call “open”).

This turns out to be harder than I thought, and in particular, some of the “obvious” ways to do this turn out to be more complicated that you’d like. So here are some various technical approaches, and a list of some related tools that implement that approach (and might possible to use as a baseline to implement automating DESTDIR).

After looking at the alternatives, I’ve decided that the “wrappers” approach is especially promising. See the Auto-DESTDIR software, which implements this wrappers approach. The wrappers approach at first seems like an odd solution, but the advantages of the wrappers approach are only compelling when you start thinking about the problems of the alternatives (as described below).

Not covered: General issues in program-specific directories or simplified source/package installation

First, let me clarify that this paper is not about the general idea of (1) creating separate directories for each different program or program installation, nor is it about (2) simplifying source/package installation in its entirety. Instead, I am focusing on a specific step, copying files into one place that will be run from another, that turns out to be important in both of these general issues (and probably others as well). The following subsections point to other programs/papers about those issues in general, and explain how automatically supporting DESTDIR can simplify these general issues.

Program-specific directories

Creating separate directories for each different program or program installation is a widely-implemented idea. For example, using the tool GNU stow, all files that implement perl might be stored in “/usr/local/stow/perl” while all files that implement emacs might be stored in “/usr/local/stow/emacs”, and the executable of emacs might be “/usr/local/stow/emacs/bin/emacs”. Many of these tools (including GNU stow) run your installation script (or have you run them) with a special setting of “prefix” (so that each program is installed in a special program-specific location). Then, they set up symbolic links to point to the “real” files (e.g., so you don’t have to have a massive constantly-changing PATH).

There are many already-existing tools that do this, including toast (GPLv3+), GNU stow (GPL), Graft (GPL), Encap (GPL license for epkg), xstow (GPL, a C++ re-implementation of GNU stow), spill (GPL, a C implementation similar to stow), GARStow (unknown license), It (GPL), opt_depot (no license found), Risacher’s localfix (no license found), and STORE (GPL). Some of these are conceptual descendents of the old CMU Depot, often via GNU stow. “Get rid of stowaway packages with GNU Stow” (David A. Harding) gives a brief intro to GNU stow.

If this is all you’re doing, and you have all necessary rights to install to the stowed directories, you might think you don’t need DESTDIR at all... just set up the prefix and store in these special directories. But it turns out that you still often want to install files to one place, yet have them run in another, which means you want to automate DESTDIR:

As noted in the GNU stow manual section 6.1 (“Compile-time and install-time”), “Software whose installation is managed with Stow needs to be installed in one place (the package directory, e.g. ‘/usr/local/stow/perl’) but needs to appear to run in another place (the target tree, e.g., ‘/usr/local’). Why is this important? What’s wrong with Perl, for instance, looking for its files in ‘/usr/local/stow/perl’ instead of in ‘/usr/local’? The answer is that there may be another package, e.g., ‘/usr/local/stow/perl-extras’, stowed under ‘/usr/local’. If Perl is configured to find its files in ‘/usr/local/stow/perl’, it will never find the extra files in the ‘perl-extras’ package, even though they’re intended to be found by Perl. On the other hand, if Perl looks for its files in ‘/usr/local’, then it will find the intermingled Perl and ‘perl-extras’ files. This means that when you compile a package, you must tell it the location of the run-time, or target tree; but when you install it, you must place it in the stow tree.”
If you are trying to set up files so that they will eventually run in a “stowed” location, but you cannot currently write to that stowed location, then you may want to use DESTDIR so that you can “install” files to an intermediate location which is not the final location for execution.
If the program you’re dealing with doesn’t properly support “--prefix” or “make prefix=value install”, you need something that can automatically redirect files to another location (so that these tools can manage them).

Simplified installation from source code

This paper does not cover the entire problem of automatically installing packages directly from source code, though it does potentially cover a piece of the problem. The idea of making it easier to install from source tarballs is nothing new; this has been raised by Francesco Montorsi and myself. There are several existing tools that try to automatically install programs from source tarballs, though most of them do not do a good job of automatically determining what is to be done, and few understand dependencies or integrate well with an existing package management system. Here are some related papers/projects:

The tool Spkgtool can act as a GUI front-end to various “symbolic link package systems” (currently supporting stow, graft, and encap/epkg), and it can automatically build and install source tarballs if they comply with the basic GNU standards (e.g., ./configure, make, and make install, with support for the make variable “prefix”).
Easinstaller is a little GUI tool for Unix beginners who need to compile a program/library from source, but don’t know how.
Dan’s autospec automatically creates RPM .spec files from tarballs. "It uses the information it can determine (from a Makefile, manual pages, an LSM file, etc.) to fill in the proper spec file fields. This allows a human packager to use the generated spec file as an almost complete template to quickly create an RPM package from a typical source or binary archive."
toast (GPLv3+) is a “simple source-and-symlinks package manager for root and non-root users”. It is a “simple, self-contained tool for downloading, building, installing, uninstalling and managing software packages. Unlike traditional package-management systems, toast is primarily intended to work directly with software distributed as source code, rather than in some precompiled or specialized binary format, such as RPM. Binary packages are also supported.” It includes some of the capabilities of GNU stow, etc., but it also includes heuristics so that it can compile straight from source code. (Which means that toast does not fit my categories well — it includes stow-like capabilities and source installation capabilities. It also has lots of heuristics to try to automatically implement DESTDIR when the underlying system fails to do so.) The toast man page has links to other interesting programs.
spm (srcpack) (on SourceForge) is a simple package manager focused on source packages.
GNU Source Installer is a “source package manager for Unix-likes. It provides configuration, compilation, installation, upgrade, tracking and removal of packages built from source code following the GNU coding standards.”
Bulldozer works with the Nautilus file manager of GNOME and supports make, Ant, NAnt, and several other formats, letting you automatically invoke build targets.
Urpkg (GPL) tries to install software in a safe way, especially from source code; it does this by creating a new user for each program that it installs, as well as using some sticky bit trickery, so that programs are protected from each other.
Luau: The Lib Update/AutoUpdate Suite enables people to download and install programs on their local systems, but it requires that software developers encode information for it (in an XML file).
Autopackage “makes software installation on Linux easy. Software distributed using Autopackage can be installed on multiple Linux distributions and integrate well into the desktop environment.” “An autopackage (a .package file) contains all the files needed for the package in a distribution neutral format with special control files inside, wrapped in a tarball with a stub script appended to the beginning. In order to install a .package file, you run it, and the scripts then check your system for the autopackage tools and offers to download them if they’re not present.” It’s essentially a special package format, designed for interoperability. The format is an API-based approach, which is different than many others.
Paco (discussed below) tries to install from source code automatically, using LD_PRELOAD (see below). But, like many other programs (like checkinstall), it simply watches what a program tries to do when it installs... it doesn’t intercept what is done, to make it right.
D.J. Bernstein’s /package package management approach is a different take on the problem.
Not-so-bad distribution is focused on verifying updates and limiting privileges when updating.
WP2 State of the Art is a wiki page that notes related programs.
stdeb (Python to Debian source package conversion utility) - stdeb ("setuptools debian") "produces Debian source packages from Python packages via a new distutils command, sdist_dsc. Automatic defaults are provided for the Debian package, but many aspects of the resulting package can be customized via a configuration file."
van.pydeb makes "egg metadata information available for Debian packaging". It is a collection of "Tools for introspecting Python package metadata and translating the resulting information into Debian metadata", including version numbers, package names, and dependencies.
pkg-config (GPLv2+) is a "helper tool used when compiling applications and libraries. It helps you insert the correct compiler options on the command line so an application can use gcc -o test test.c `pkg-config --libs --cflags glib-2.0` for instance, rather than hard-coding values on where to find glib (or other libraries)." It's run out of the freedesktop.org site. The information it uses is stored in ".pc" files. See the Pkg-config Wikipedia page, the pkg-config man page, and this pkg-config guide.
CPANPLUS::Dist::RPM is "a distribution class to create RPM packages from CPAN modules, and all its dependencies. This allows you to have the most recent copies of CPAN modules installed, using your package manager of choice, but without having to wait for central repositories to be updated."
java-package helps create Debian packages

And of course, this paper is not about package management in general, e.g., programs that support .rpm and .deb formats. However, to create .rpm and .deb files, it is important to support DESTDIR. This paper is about how to easily support DESTDIR, without twiddling makefiles.

Why not just support DESTDIR or make prefix=X install?

There’s no need for a special tool to support DESTDIR for programs that already support DESTDIR. In some programs that don’t support DESTDIR, you can have the effect by setting the “prefix” variable when running make install, that is, make prefix=MY_DESTDIR_VALUE install. It would be far better if source code releases followed the normal good practices for releasing FLOSS software source packages, including support for DESTDIR.

But this does not always work, for a variety of reasons. Many makefiles do not support DESTDIR at all. Many makefiles also don’t support “prefix”, or if they do, they forceably re-build the program when the prefix value is changed for make install (making the workaround useless). There are so many programs that do not follow normal good practices that we must to deal with the world as it is, not as we wish it would be. We could modify tiny makefiles, but large multi-directory makefiles can be hideously hard to modify correctly, and then there is the problem of getting those changes accepted upstream. Since so many programs don’t support DESTDIR, it’d be nice to be able to automatically support DESTDIR without having to constantly muck around in complicated makefiles or other build/installation systems for program after program. Then, instead of having programmers around the world constantly changing their makefiles, it will just work.

Kernel-based re-routing

The Linux kernel gets all read and write requests, so re-routing at the kernel level would be great - in theory, the re-routing would be perfect, and should have good performance. The big problem is that it requires basic changes in low-level infrastructure, where any mistakes could create a massive security hole... making it understandably difficult to get people to accept changes at this level.

Union mounts

Union mounts can merge multiple directories (e.g., one is “read only” and the other written to). Generally, these require root privileges, though that’s not a killer - a setuid program could use them, for example.

There are several kernel modules that implement union mounts, but they’re not widely avaiable on Linux distributions (as of early 2009). The best-known union mount implementation is UnionFS, and another implementation is aufs; both implement union mounts as a new filesystem. Union mounts implement union mounts inside Linux, but at the VFS layer instead of as a new filesystem; at this moment this is very immature and not ready for normal use. Many Linux distributions do NOT have unionfs, aufs, or “union mounts” since they are not in the default Linux kernel.

A FUSE-based implementations of a union file system can be used today, and doesn’t require changes to the Linux kernel (as Unionfs, etc. require). FUSE is already part of the usual Linux kernel, and it allows file requests to be redirected out to user programs. In particular, funionfs implements a union filesystem using FUSE, and is included in Fedora, Debian, and Ubuntu. PlasticFS version 1.12 uses FUSE as well. By design a FUSE-based approach requires more work than unionfs (due to extra context switches), but for only a “make install” this isn’t so bad. One implementer of a unionfs-on-FUSE reports that the I/O processing completely buries this overhead anyway. Unfortunately, funionfs (at least) is also global (instead of per-process) - again, a problem for a shared packaging systems if used the “obvious” way. I should note that instead of reusing existing union file systems to redirect DESTDIR, FUSE could be used to directly implement this approach.

By themselves, these kinds of union mounts described above are always global to the whole system, so if you directly did a union mount of directories like “/usr” you would have trouble using a shared packaging system. Such a global approach to redirecting could easily cause problems administering the system. And there are a lot of security problems, too, if this is just a global situation. So this should really be done for a set of processes rather than the whole system, as discussed next.

Process Group Unique Root

A union mount can be made unique to a process group through a variety of mechanisms. The “obvious” way is to recreate a new filesystem tree in a subdirectory, using mount --bind and union mounts (as above, say using funionfs and FUSE) to create a new filesystem that looks like the old one but is not visible to all. You can then use chroot (or pivot_root) to set the process group to the new filesystem. A variant of this approach would be to use mount namespaces, which again create filesystems that are specific to a process group (instead of being global to all processes). Again, the point would be to redirect writes to /usr, /bin, /lib, /etc. All of this could be implemented with a small suid program.

Ideally, it’d be rigged so that the process group isn’t root, but it can still write to the new local /usr (etc.). Bonus points if it pretends to be root and records the parameters (a la fakeroot) - which could fool even complex “make install” routines.

For security, the key problem is that the process running “make install” should never be privileged. In particular, the process should not have root privileges, nor should it be allowed to raise its privileges by running set-uid programs that actually setuid. Otherwise, it could use its root privileges to get out of the jail, or run an suid program that wouldn’t realize that the filesystem is rigged (and then get exploited). Traditionally, “make install” is given total privileges, but we want to not do that. If “make install” is started with normal user privileges, that at least gets us started, but we need to make sure that privileges can’t be added later via setuid programs. We could do this by making sure all mounts disable setuid/setgid; mount already has this ability. Alternatively, we could forbid running executables with that setting (I believe SELinux and “cuppabilities” can do this).

This approach - having a FUSE-based union mount approach that is local to a process group (e.g., chroot) - is the most robust technically, since it can redirect any non-setuid command used in the “make install”. It also has low overhead. But the effort of making sure it’s secure may make it difficult for distributors to accept it.

LD_PRELOAD

Many programs, like installwatch, use LD_PRELOAD to intercept library functions. There are various positives: LD_PRELOAD already exists, and it works per-process (so it doesn’t interfere with other programs). Unfortunately, LD_PRELOAD has many technical downsides.

LD_PRELOAD based approaches can’t redirect statically linked executables. Unfortunately, the programs most used by most install scripts are also the ones most likely to be statically linked (to increase reliability and enable recovery from serious library management problems). I know that SuSE’s “ln” is statically linked, and that FreeBSD and OpenBSD’s key routines used in installation are statically linked, and this is true for many other systems as well.

You might think that once you override open(), all calls to open() would be overridden, but this isn't true by default if the caller is inside the C library itself. It turns out that the standard GNU C library uses names prefixed by "__" whenever it calls internal functions. For example, the C library implements fopen(), but the fopen() implementation internally calls __open(), not open(). In addition - and this is the kicker - by default the GNU C library will not let you override these __functions using LD_PRELOAD. So if you override just open(), an application that calls open() directly will be overridden... but an application that uses fopen() will skip right past it. You can recompile the GNU C library so that the redirection will occur by using the poorly-documented "--disable-hidden-plt" option. But in practice, this means that you have to recompile the C library. This is generally not well-received by distributions; Debian specifically rejected doing this, and I suspect others will do the same. Few will want to change the default, because doing things this way speeds up normal use. An alternative is wrapping all the C library calls, but that's more work.

I found no program my needs for automating DESTDIR while using LD_PRELOAD, so I've started writing such a program: user-union. User-union creates union mounts, without requiring special privileges, using LD_PRELOAD, and it can integrate with auto-destdir.

Here are some existing related programs that already use LD_PRELOAD (though most just watch what files are changed, and do not let us change where the files go):

toast uses LD_PRELOAD as one of its tricks for changing where things install; unlike many of the other items noted below, it actually changes where files are placed instead of just watching them.
checkinstall is one partial answer (here’s another Checkinstall location).
Checkinstall, in turn, includes the installwatch utility written by Pancrazio ‘Ezio’ de Mauro, installwatch appears to have some filesystem redirecting ability, (there’s a note about “--fstrans=no” and issues with openat), but that’s not clear. installwatch is maintained as part of checkinstall; it does not redirect file creation.
gnashley (src2pkg developer) believes that checkinstall is no longer properly maintained, so instead, has developed a ‘trackinstall’ program (a drop-in replacement for ‘checkinstall’) as part of src2pkg (this is built on “libsentry”). These let you run “make install” and track what changed, as part of larger tools to auto-create packages from source code. But src2pkg’s approaches don’t seem quite right; it supports (1) “real root” which doesn’t redirect, overwrites, and requires root, (2) DESTDIR method, which requires that DESTDIR work (it often doesn’t, and that’s the problem we’re trying to solve), and (3) JAIL, which redirects writes but doesn’t seem to correctly redirect reads to the right place (ugh) - so it doesn’t work well on many scripts.
PlasticFS up to version 1.11 used LD_PRELOAD to create a filesystem. It tried to not redirect many calls, and instead asked users to first recompile the GNU C compiler with "--disable-hidden-plt". That was completely impractical; rather than covering more functions (such as fopen), in version 1.12 it switched to FUSE.
FL-COW ("Copy on write") copies files if they're being opened for writing and they are hard linked to somewhere else, using LD_PRELOAD. Not exactly what I was looking for, but some similar ideas. GPLv2. It only covers a few functions (open, openat, fopen, freopen, and their *64 versions).
paco (Package Organizer) (GPL) is a “source code package organizer for Unix/Linux systems... When installing a package from sources, paco wraps the ‘make install’ command (or whatever is needed to install the files into the system), and generates a log containing the list of all installed files. Technically, this is done by preloading a shared library before installation using the environment variable LD_PRELOAD. During installation this library catches the system calls that cause filesystem alterations, logging the created files... Gpaco is the graphic interface of paco.” The Paco home page specifically notes that “Paco does not work on systems in which the executables involved in the installation of the packages (mv, cp, install...) are statically linked against libc, like FreeBSD and OpenBSD.” Paco, like many other tools, can only log what a program tries to do... it cannot redirect files elsewhere. But in a number of cases, we want to control where the files go, not just watch them go to the wrong place.
DanF has developed autospec and notes other programs, such as “slurp” (which uses the library installwatch to notice changed files).
Brent Baccala's Preload libraries can redirect file accesses using LD_PRELOAD. Unfortunately they don't seem to be OSS (no license found). It wants a glibc patch to be installed; it appears to me this is the same effect as the using "--disable-hidden-plt" option and recompiling glibc.
EPOR (GPL) is an “extensible package organiser for Unix like systems. It’s written to trace filesystem changes (something being installed) and save those information in a simple textual db (but this as any other provided feature is customisable by embedded guile interpreter see chapter Customise epor). So, when a package is installed using epor to trace it, an entry is created in a local db. This entry contains informations supplied by command line (package name, version, ...) and traced by filesystem changes (new directories, files ...). This is achieved using the “LD_PRELOAD method”. Using informations stored, epor let you remove files installed by a particular package or view them in different ways.”
Fakeroot tracks permissions, but doesn’t redirect at all. Fakeroot is heavily used in Debian, and has a server daemon to help it, but it intentionally doesn’t wrap open() and create. That's because previous experience with the libtricks package suggested that trying to wrap these would lead to endless problems.
libtricks can redirect open-for-read (a mega-VPATH), but it’s not clear that it can redirect writes... and it doesn’t seem to be maintained anyway. The creator of the fakeroot and libtricks packages found that trying to wrap open() and create “creates other problems, as demonstrated by the libtricks package. This package wrapped many more functions, and tried to do a lot more than fakeroot. It turned out that a minor upgrade of libc (from one where the BR stat() function didn’t use open() to one [that sometimes did]) would cause unexplainable segfaults... once fixed, it was just a matter of time before another function started to use open()... Thus I decided to keep the number of functions wrapped by fakeroot as small as possible, to limit the likelihood [of] collisions... I choose not to wrap open(), as open() is used by many other functions in libc (also those that are already wrapped), thus creating loops (or possible future loops, when the implementation of various libc functions slightly change).” An April 3 1999 posting by Joel Klecker noted that, in order to make this work, "libtricks grovels deep within glibc internals (to the point that it has its own copies of internal glibc headers), I am not entirely sure if it can ever work with glibc 2.1, since an external lib and programs cannot access internal libc symbols. fakeroot grovels at a much higher level and still works." In short, previous experience trying to wrap open() with LD_PRELOAD suggests that this is a bad idea.
Soapbox can prevent writes, but not redirect them.

Ptrace

Ptrace() is a kernel-level call for “process tracing” (e.g., to watch/change system calls made by a process). It’s intended for debugging, but since it can watch another process, it can be used for this purpose. This has serious advantages; these can handle statically linked executables (while LD_PRELOAD-based approaches can’t), so SuSE’s “ln” is a non-issue. Since they track at the system call level, this approach is immune to the races and other problems of fakeroot and libtricks. Unfortunately, while ptrace() is great for watching what a program does, using it to change what a program does is far more complicated. Perhaps the premiere example of a related program that uses the ptrace() approach is TrackFS:

TrackFS. “trackfs runs the child program(s) with tracing enabled and tracks the system calls they make.”

Such a program could be implemented using ptrace and semantics like this:

On open for read (not write), look at DESTDIR first - if it’s there, use it. Otherwise, try to open un-redirected. This saves disk space, as well as saving time by not copying files. In short, if a file is only used for reading/executing, and never written to, then just use it as-is.
On open for read AND write, look at DESTDIR first, and use it if there. Otherwise, copy any existing file to DESTDIR, then use it.
On open for write (not read), create any prefixed directories that exist on the original side. Then, open for write under DESTDIR. (It should fail if the original would have failed.)
“chmod” is essentially a write operation; redirect as above, but copy the file if it doesn’t exist.
“unlink”: If in DESTDIR, remove it. Bonus points: remember what you “removed”, so that later queries about it will claim it’s not there. Note that unionfs, funionfs, and so on have to handle this too (e.g., using “whitelists”). So you can “rm /bin/sh”, and it’s not there... and there’s no harm to the “real” filesystem.

I may have missed a corner case, but it should work in principle.

I’ve had a very interesting email exchange with TrackFS’s creator, Michael Riepe, about this. He pointed out that this requires that the controlling process actually change the name of the file being opened, which requires using memory space somewhere. I suggested that it’d be possible to patch the stack with the new filename, use it, then halt and restore things back... so that you don’t have to try to do memory allocations and such. Michael isn’t sure that the stack grows correctly at all times (what about stack overflow?), but it’s plausible. Another issue he noted is that (absolute) symbolic links might not work correctly. I’m not sure about that, but that would cause complications to the rules above.

Fakeroot Next Generation (fakeroot-ng) is using the ptrace() mechanism to get around the limitations of LD_PRELOAD, but as their website notes, using ptrace() creates many complexities.

Originally, I thought the ptrace approach would be best, but the rules kept getting more complex, and the stack twiddling was more than I was hoping for. In short, implementing DESTDIR this way is quite complicated, involving a lot of architecture-dependent tweaking. Is there a better, simpler way? For this particular problem, I think there is.

Modified basic commands

In most software, the “make install” command only uses a few simple commands to actually install the software. In my experience, the most common command by far is “install”, which is hardly surprising. Other common commands used in “make install” that might need redirecting from privileged directories (like /bin, /usr, and /etc) include cp, mkdir, ln, mv, touch, chmod, chown, ls, rm, and rmdir. It might also be useful to redirect “test”, though this is also a bash built-in (making its replacement more complicated) and I haven’t found any Makefiles where redirection of “test” is needed for “make install”. Programs that use libtool usually support DESTDIR directly, but even if they didn’t, the point is the same: “make install” tends to use only a very few programs.

So given that “make install” tends to only a few commands, one “obvious” approach would be to modify just these basic commands so that they will redirect their writes (e.g., if an environmental variable is set). Then the packager can just set and environmental variable and run “make install” This seems completely appropriate for “install”; the whole purpose is to perform installs, so adding functionality so it can do installs in a common case (creating packages) seems appropriate.

It’d be best if setting this up would be easy. I would suggest REDIR_DESTDIR as the environmental variable name. If it’s set, then writers are redirected to that as the root of the filesystem. For the “install” command, I think any use should be redirected (since by definition, all invocations are installs). With one exception: If install is invoked to install to inside REDIR_DESTDIR, then don’t re-prefix it; this avoids some awkward loops, and makes it easy for packagers to “automate” installation by always setting REDIR_DESTDIR. Other commands are only sometimes use for installation, but I suspect a simple detection method would be sufficient. For example, perhaps all attempts to write to a directory which only a privileged user/group (e.g., root) can write to would be redirected so that “/” becomes REDIR_DESTDIR. This way, “install xyz /bin” immediately becomes “install xyz ${REDIR_DESTDIR}/bin”, and similarly for “cp”, but “cp xyz .” doesn’t get redirected if the directory is local and writable by an ordinary user (such as a user creating the package). Don’t use “INSTALL_DESTDIR” as the environment variable name - it turns out this gets used by many installation makefiles, and would cause trouble instead of helping. This way, you don’t have to list which directories get redirected - temporary and local files aren’t redirected, while files getting installed will get redirected.

Obviously, if an attacker can control the environment variable of a root user, then the root user’s commands will get redirected. But if an attacker controls root’s environment, the system is already compromised (environment variables can already control the system in lethal ways in such cases). Never transition to root (from non-root) without removing all environment variables, and then adding in only the ones that you are certain are okay. This would be no different.

This approach is easy to apply (once the commands are changed), executes quickly, clearly works in a shared environment, and has no security issues. So there’s a lot going for it.

This approach only redirects those particular commands, and that is its fundamental weakness. However, although a lot of “make install” routines recurse deeply and do complicated things in their source directories, I’ve found in a quick scan that most only use a few limited commands (like cp, install, and mkdir) to actually do the installing where this would matter, so I think this approach would be remarkably successful. Unlike the LD_PRELOAD approach, this even works if programs like /bin/ln are installed statically.

Unfortunately, this requires changing some really low-level key programs like “cp” and “mkdir”. This is probably easily justified for install, since the whole purpose of “install” is to install programs, and packages are very common in today’s world. But changing “cp” and “mkdir” is no small matter; even if all agreed to it (and such agreement is rare), it’d take a long time to widely deploy (think of not only the many Linux distros, but also the *BSDs, Cygwin, etc.). So while this could be a long-term strategy, it’s not so great for the short term. Is there any way we can make things simpler? I believe there is, as discussed next.

Wrappers for basic install commands and a special PATH

As noted above, typically only a few commands in “make install” actually need to be redirected. We could simply modify the PATH environment variable so that its first directory is a “wrapper” directory. The wrapper directory would contain specialized “wrapped” versions of common commands that are used to install software when running in “make install”. These wrapped versions would then redirect the file-writing. As noted earlier, such commands might include: install, cp, mkdir, ln, mv, touch, chmod, chown, ls, rm, and rmdir. (As noted above, programs that use libtool usually support DESTDIR directly and thus don’t need help.)

This approach has the most of the same pluses and minuses as the previous approach, in particular, it only redirects those few commands. However, since those are the commands actually used by “make install”, in many cases this approach should be fine. One big additional positive, however, is that this can be done right away; no changing of fundamental programs is required. It requires no special privileges (and thus has no security impact), and running the wrappers can be quick (so there is practically no performance impact). The wrappers can be written in portable shell, which means that the wrappers can be really small and have no extra dependencies (so there would be no reason to avoid using them for installation).

One weakness of this approach is that a “make install” that invokes one of these commands using its fullpathname (e.g., /bin/cp or /bin/install instead of “cp” or “install”) will not use the new redirecting command. I’ve found that some makefiles set INSTALL as /bin/install, and it’s possible a few other programs are done that way too. However, in many cases these are trivially overridden by invoking the “make install” program as “make INSTALL=install CP=cp MV=mv install”, so this problem is typically easy to overcome.

This is a limited approach: It only redirects a few commands. But as long as “make install” routines use only a few commands to install programs — which seems to be the norm — then this approach is remarkably simple and effective. You could do worse than something that’s simple and effective. If you really need lots of programs redirected, you might be able to combine this with LD_PRELOAD based approaches; LD_PRELOAD works with many programs but tends to fail on the few that most matter (e.g., cp, mkdir, ln), so you can wrap the programs that LD_PRELOAD fails on, and let it pick up the rest.

Instead of fiddling with the PATH, it’d be possible to use chroot to use these special commands instead. However, that re-raises some of the complexity and security problems of chroot.

This is another trick that toast uses (as well as LD_PRELOAD).

I’ve implemented this approach (with a small dosage of the “make using special SHELL approach” described next). I implemented it using the bash shell. Most installations already have bash installed; if I’d used perl or Python, a user would have to install a much bigger program just to run it. C is terrible for string processing, so C would not be a great way to implement it either. If you’re interested, please go take a look at Auto-DESTDIR.

Make using special SHELL

Many make programs, such as GNU make, include the ability to set what shell to run. E.G., “make SHELL=... “ lets you override which shell to run. A special shell could then be used to override where the files go. By itself, this could be easily fooled; a number of install scripts call sub-scripts which then do the work. However, this might integrate very nicely with other approaches; it would make it possible to “catch” file redirections, for example, and override calls to “/usr/bin/install” and such. One problem is that this can easily lead to completely re-implementing the shell, which is a terrible idea.

Chroot

The “chroot” call is available everywhere. A traditional chroot jail could be created so that “written” files aren’t written to the “real” system. This was one of the first approaches I thought of. Unfortunately, on most Linux-based systems it can be rather complicated to set up proper chroot environments. Calling chroot() is easy, but setting up the right environment to use it may involve either a large number of shared mounts that can’t be written to (a security concern) or a vast amount of file copying. Calling on chroot() requires root privileges, which distributions are loathe to give, and root privileges must be later dropped after the environment is set up (since root privileges can escape chroot() jails); this can make it difficult to integrate with other tools (such as many package recompilation tools).

FreeBSD automatically implements DESTDIR using chroot, but it isn’t entirely clear that how FreeBSD handles this is entirely desirable. FreeBSD’s approach stores package information in $DESTDIR/var/db/pkg, which in many cases is not where you wanted that information. I suspect that FreeBSD’s approach depends on features that other kernels (including Linux) do not have, e.g., it depends on mount_nullfs(1), which appears to make their approach (implemented in file bsd.destdir.mk) hard to move to more-popular systems. It’s interesting to note that implementing this was not easy; it took two tries to get this functionality working. (Here’s a web-accessible copy of bsd.destdir.mk implementing DESTDIR.) In any case, it is not at all clear that various Linux distributors are willing to use chroot to automate DESTDIR. See the kernel material (above) for some of the negatives of this.

RUST is “a toolkit for creating RPM packages to distribute software... [it] is both a drag & drop RPM creation GUI and a ‘sandboxing’ toolkit that allows you to do software installations within a chrooted environment and automatically generate RPMs from arbitrary source code, without ever seeing a spec file.” But since it doesn’t actually create .spec files, its results cannot be submitted to typical Linux repositories (like Fedora’s).

Other Sandboxes

There are other sandboxing approaches beyond chroot, such as User Mode Linux, that could be used. But these appear really heavyweight for the purpose. Examples include:

Plash is an approach for sandboxing. As explained on the Plash website, it performs sandboxing by using a chroot to prevent all file access, and then modifying “library calls (such as open()) so that they make remote procedure calls (RPCs) to another process instead of making the usual Linux system calls.” I think this is too heavyweight for this task.

Conclusions and related info

I’ve implemented the “Wrappers for basic install commands and a special PATH” approach above to automate DESTDIR. To get it, get the auto-DESTDIR package. As long as “make install” only uses a limited set of commands — which seems to be true in practice — this approach seems to solve the problem without requiring security issues or complicated reconfiguration of low-level infrastructure. It’s also very, very portable.

You might also find the user-union program useful, if you are trying to automate DESTDIR for existing programs. User-union creates union mounts, without requiring special privileges, and it can work with auto-DESTDIR.

Supporting DESTDIR is one of several good practices when releasing FLOSS software - if you release FLOSS software, you should follow those general community guidelines. If you don’t, at least make sure that Auto-DESTDIR can automate its support.