Automating DESTDIR for Packaging

by David A. Wheeler

2009-02-19 (updated 2011-07-29)

 

It is unnecessarily hard to create native packages (like deb and RPM), and unnecessarily hard to directly install source code packages under the control of programs like GNU stow, because many software source packages fail to support the DESTDIR convention.  The DESTDIR convention makes it easy to compile a program so that it will run in some directory X, but be installed in directory $DESTDIR/X instead. There are a vast number of source packages that do not support DESTDIR, and it’s often difficult to add DESTDIR support to complex makefiles. This paper discusses what could be done to automatically support DESTDIR, instead of requiring every source package in the universe be changed to support DESTDIR. This paper shows that there are practical ways to automate support for DESTDIR, and points to tools like auto-DESTDIR and user-union that implement some of these solutions.

Introduction

Today’s users of Linux and Unix systems don’t want to follow complicated instructions to install programs — they want to click on one button, and have everything installed as necessary.  Ideally, they should be able to install programs using their native package format and download tools, such as deb (used by Debian and Ubuntu) and RPM (used by Fedora, Red Hat, and SuSE/Novell).  But that means that someone has to create those packages.  Alternatively, they should be able to use a program that can automatically download, compile, and install from source code, perhaps with some program like GNU stow that can place each package in its own separate directory while appearing to all be in one place. Whether you are creating native packages, or automatically installing source packages, it’s often vital to be able to compile the program so that it will run in some directory X, but install the program in some other directory $DESTDIR/Y.  There is a standard way to do this: Have the source package support the DESTDIR convention.

Unfortunately, many software source packages fail to support the DESTDIR convention, and it’s sometimes a real pain to add DESTDIR support. The build programs (e.g., Makefiles) can be large, complex, and multilayered... so it can be painful for a packager to modify the build scripts to add DESTDIR support. A packager can send DESTDIR patches upstream, but they may be ignored or improperly maintained... which means that the packager may need to keep re-modifying the build system, every time the program is updated. Ugh. All too often, packaging can be completely automated except for the lack of DESTDIR support.

Why is DESTDIR important? There are many reasons, but let's look at two examples:

It’d be much better if the equivalent of DESTDIR could be automated, without requiring application programs to add DESTDIR support to their installers.  After all, almost any “real” Unix/Linux program with source code available supports “make install” (or equivalent) for installation.  A “make install” process presumes that it is writing the files to the “real” filesystem.  Ideally, there’d be a way to reroute writes to the “real” filesystem to some other directory tree, so that they could be packaged or used with programs like GNU stow. This shouldn’t be that hard - “make install” may invoke many programs and do a lot of recursive directory descending to figure out what to install, but the commands that actually do the installing are usually simple ones like “install” and “cp”.

Ideally this re-routing process would work without requiring the program running “make install” to run as root. That way, a non-root user could do “make-redir DESTDIR=newdir install” (or equivalent) and have all the “installed” files show up inside newdir. Ideally, it would be efficient, and could track other information (such as permissions and owners).  Also, these should be able to work without re-routing programs that don’t descend from “make install”; often final packaging is done on a shared machine that is packaging multiple programs simultaneously.  A lot of tools don’t quite do this; they primarily just ‘track’ what’s changed, require special privileges, and so on. Some tools that you would think do this can’t do anything remotely like it; for example, “fakeroot” (widely used by Debian) can record owners, but it can’t redirect writes to files (because it doesn’t wrap the system call “open”).

This turns out to be harder than I thought, and in particular, some of the “obvious” ways to do this turn out to be more complicated that you’d like. So here are some various technical approaches, and a list of some related tools that implement that approach (and might possible to use as a baseline to implement automating DESTDIR).

After looking at the alternatives, I’ve decided that the “wrappers” approach is especially promising. See the Auto-DESTDIR software, which implements this wrappers approach. The wrappers approach at first seems like an odd solution, but the advantages of the wrappers approach are only compelling when you start thinking about the problems of the alternatives (as described below).

Not covered: General issues in program-specific directories or simplified source/package installation

First, let me clarify that this paper is not about the general idea of (1) creating separate directories for each different program or program installation, nor is it about (2) simplifying source/package installation in its entirety. Instead, I am focusing on a specific step, copying files into one place that will be run from another, that turns out to be important in both of these general issues (and probably others as well). The following subsections point to other programs/papers about those issues in general, and explain how automatically supporting DESTDIR can simplify these general issues.

Program-specific directories

Creating separate directories for each different program or program installation is a widely-implemented idea. For example, using the tool GNU stow, all files that implement perl might be stored in “/usr/local/stow/perl” while all files that implement emacs might be stored in “/usr/local/stow/emacs”, and the executable of emacs might be “/usr/local/stow/emacs/bin/emacs”. Many of these tools (including GNU stow) run your installation script (or have you run them) with a special setting of “prefix” (so that each program is installed in a special program-specific location). Then, they set up symbolic links to point to the “real” files (e.g., so you don’t have to have a massive constantly-changing PATH).

There are many already-existing tools that do this, including toast (GPLv3+), GNU stow (GPL), Graft (GPL), Encap (GPL license for epkg), xstow (GPL, a C++ re-implementation of GNU stow), spill (GPL, a C implementation similar to stow), GARStow (unknown license), It (GPL), opt_depot (no license found), Risacher’s localfix (no license found), and STORE (GPL). Some of these are conceptual descendents of the old CMU Depot, often via GNU stow. “Get rid of stowaway packages with GNU Stow” (David A. Harding) gives a brief intro to GNU stow.

If this is all you’re doing, and you have all necessary rights to install to the stowed directories, you might think you don’t need DESTDIR at all... just set up the prefix and store in these special directories. But it turns out that you still often want to install files to one place, yet have them run in another, which means you want to automate DESTDIR:

Simplified installation from source code

This paper does not cover the entire problem of automatically installing packages directly from source code, though it does potentially cover a piece of the problem. The idea of making it easier to install from source tarballs is nothing new; this has been raised by Francesco Montorsi and myself. There are several existing tools that try to automatically install programs from source tarballs, though most of them do not do a good job of automatically determining what is to be done, and few understand dependencies or integrate well with an existing package management system. Here are some related papers/projects:

And of course, this paper is not about package management in general, e.g., programs that support .rpm and .deb formats. However, to create .rpm and .deb files, it is important to support DESTDIR. This paper is about how to easily support DESTDIR, without twiddling makefiles.

Why not just support DESTDIR or make prefix=X install?

There’s no need for a special tool to support DESTDIR for programs that already support DESTDIR. In some programs that don’t support DESTDIR, you can have the effect by setting the “prefix” variable when running make install, that is, make prefix=MY_DESTDIR_VALUE install. It would be far better if source code releases followed the normal good practices for releasing FLOSS software source packages, including support for DESTDIR.

But this does not always work, for a variety of reasons. Many makefiles do not support DESTDIR at all. Many makefiles also don’t support “prefix”, or if they do, they forceably re-build the program when the prefix value is changed for make install (making the workaround useless). There are so many programs that do not follow normal good practices that we must to deal with the world as it is, not as we wish it would be. We could modify tiny makefiles, but large multi-directory makefiles can be hideously hard to modify correctly, and then there is the problem of getting those changes accepted upstream. Since so many programs don’t support DESTDIR, it’d be nice to be able to automatically support DESTDIR without having to constantly muck around in complicated makefiles or other build/installation systems for program after program. Then, instead of having programmers around the world constantly changing their makefiles, it will just work.

Kernel-based re-routing

The Linux kernel gets all read and write requests, so re-routing at the kernel level would be great - in theory, the re-routing would be perfect, and should have good performance.  The big problem is that it requires basic changes in low-level infrastructure, where any mistakes could create a massive security hole... making it understandably difficult to get people to accept changes at this level.

Union mounts

Union mounts can merge multiple directories (e.g., one is “read only” and the other written to). Generally, these require root privileges, though that’s not a killer - a setuid program could use them, for example.

There are several kernel modules that implement union mounts, but they’re not widely avaiable on Linux distributions (as of early 2009). The best-known union mount implementation is UnionFS, and another implementation is aufs; both implement union mounts as a new filesystem. Union mounts implement union mounts inside Linux, but at the VFS layer instead of as a new filesystem; at this moment this is very immature and not ready for normal use. Many Linux distributions do NOT have unionfs, aufs, or “union mounts” since they are not in the default Linux kernel. 

A FUSE-based implementations of a union file system can be used today, and doesn’t require changes to the Linux kernel (as Unionfs, etc. require). FUSE is already part of the usual Linux kernel, and it allows file requests to be redirected out to user programs. In particular, funionfs implements a union filesystem using FUSE, and is included in Fedora, Debian, and Ubuntu. PlasticFS version 1.12 uses FUSE as well. By design a FUSE-based approach requires more work than unionfs (due to extra context switches), but for only a “make install” this isn’t so bad. One implementer of a unionfs-on-FUSE reports that the I/O processing completely buries this overhead anyway.  Unfortunately, funionfs (at least) is also global (instead of per-process) - again, a problem for a shared packaging systems if used the “obvious” way. I should note that instead of reusing existing union file systems to redirect DESTDIR, FUSE could be used to directly implement this approach.

By themselves, these kinds of union mounts described above are always global to the whole system, so if you directly did a union mount of directories like “/usr” you would have trouble using a shared packaging system. Such a global approach to redirecting could easily cause problems administering the system. And there are a lot of security problems, too, if this is just a global situation. So this should really be done for a set of processes rather than the whole system, as discussed next.

Process Group Unique Root

A union mount can be made unique to a process group through a variety of mechanisms. The “obvious” way is to recreate a new filesystem tree in a subdirectory, using mount --bind and union mounts (as above, say using funionfs and FUSE) to create a new filesystem that looks like the old one but is not visible to all. You can then use chroot (or pivot_root) to set the process group to the new filesystem. A variant of this approach would be to use mount namespaces, which again create filesystems that are specific to a process group (instead of being global to all processes). Again, the point would be to redirect writes to /usr, /bin, /lib, /etc. All of this could be implemented with a small suid program.

Ideally, it’d be rigged so that the process group isn’t root, but it can still write to the new local /usr (etc.).  Bonus points if it pretends to be root and records the parameters (a la fakeroot) - which could fool even complex “make install” routines.

For security, the key problem is that the process running “make install” should never be privileged. In particular, the process should not have root privileges, nor should it be allowed to raise its privileges by running set-uid programs that actually setuid. Otherwise, it could use its root privileges to get out of the jail, or run an suid program that wouldn’t realize that the filesystem is rigged (and then get exploited). Traditionally, “make install” is given total privileges, but we want to not do that. If “make install” is started with normal user privileges, that at least gets us started, but we need to make sure that privileges can’t be added later via setuid programs. We could do this by making sure all mounts disable setuid/setgid; mount already has this ability. Alternatively, we could forbid running executables with that setting (I believe SELinux and “cuppabilities” can do this).

This approach - having a FUSE-based union mount approach that is local to a process group (e.g., chroot) - is the most robust technically, since it can redirect any non-setuid command used in the “make install”.  It also has low overhead.  But the effort of making sure it’s secure may make it difficult for distributors to accept it.

LD_PRELOAD

Many programs, like installwatch, use LD_PRELOAD to intercept library functions.  There are various positives: LD_PRELOAD already exists, and it works per-process (so it doesn’t interfere with other programs). Unfortunately, LD_PRELOAD has many technical downsides. 

LD_PRELOAD based approaches can’t redirect statically linked executables.  Unfortunately, the programs most used by most install scripts are also the ones most likely to be statically linked (to increase reliability and enable recovery from serious library management problems). I know that SuSE’s “ln” is statically linked, and that FreeBSD and OpenBSD’s key routines used in installation are statically linked, and this is true for many other systems as well.

You might think that once you override open(), all calls to open() would be overridden, but this isn't true by default if the caller is inside the C library itself. It turns out that the standard GNU C library uses names prefixed by "__" whenever it calls internal functions. For example, the C library implements fopen(), but the fopen() implementation internally calls __open(), not open(). In addition - and this is the kicker - by default the GNU C library will not let you override these __functions using LD_PRELOAD. So if you override just open(), an application that calls open() directly will be overridden... but an application that uses fopen() will skip right past it. You can recompile the GNU C library so that the redirection will occur by using the poorly-documented "--disable-hidden-plt" option. But in practice, this means that you have to recompile the C library. This is generally not well-received by distributions; Debian specifically rejected doing this, and I suspect others will do the same. Few will want to change the default, because doing things this way speeds up normal use. An alternative is wrapping all the C library calls, but that's more work.

I found no program my needs for automating DESTDIR while using LD_PRELOAD, so I've started writing such a program: user-union. User-union creates union mounts, without requiring special privileges, using LD_PRELOAD, and it can integrate with auto-destdir.

Here are some existing related programs that already use LD_PRELOAD (though most just watch what files are changed, and do not let us change where the files go):

Ptrace

Ptrace() is a kernel-level call for “process tracing” (e.g., to watch/change system calls made by a process).  It’s intended for debugging, but since it can watch another process, it can be used for this purpose.  This has serious advantages; these can handle statically linked executables (while LD_PRELOAD-based approaches can’t), so SuSE’s “ln” is a non-issue. Since they track at the system call level, this approach is immune to the races and other problems of fakeroot and libtricks. Unfortunately, while ptrace() is great for watching what a program does, using it to change what a program does is far more complicated. Perhaps the premiere example of a related program that uses the ptrace() approach is TrackFS:

Such a program could be implemented using ptrace and semantics like this:

  1. On open for read (not write), look at DESTDIR first - if it’s there, use it. Otherwise, try to open un-redirected.  This saves disk space, as well as saving time by not copying files.  In short, if a file is only used for reading/executing, and never written to, then just use it as-is.
  2. On open for read AND write, look at DESTDIR first, and use it if there. Otherwise, copy any existing file to DESTDIR, then use it.
  3. On open for write (not read), create any prefixed directories that exist on the original side. Then, open for write under DESTDIR. (It should fail if the original would have failed.)
  4. “chmod” is essentially a write operation; redirect as above, but copy the file if it doesn’t exist.
  5. “unlink”: If in DESTDIR, remove it. Bonus points: remember what you “removed”, so that later queries about it will claim it’s not there.  Note that unionfs, funionfs, and so on have to handle this too (e.g., using “whitelists”).  So you can “rm /bin/sh”, and it’s not there... and there’s no harm to the “real” filesystem.

I may have missed a corner case, but it should work in principle.

I’ve had a very interesting email exchange with TrackFS’s creator, Michael Riepe, about this.  He pointed out that this requires that the controlling process actually change the name of the file being opened, which requires using memory space somewhere.  I suggested that it’d be possible to patch the stack with the new filename, use it, then halt and restore things back... so that you don’t have to try to do memory allocations and such.  Michael isn’t sure that the stack grows correctly at all times (what about stack overflow?), but it’s plausible.  Another issue he noted is that (absolute) symbolic links might not work correctly.  I’m not sure about that, but that would cause complications to the rules above.

Fakeroot Next Generation (fakeroot-ng) is using the ptrace() mechanism to get around the limitations of LD_PRELOAD, but as their website notes, using ptrace() creates many complexities.

Originally, I thought the ptrace approach would be best, but the rules kept getting more complex, and the stack twiddling was more than I was hoping for. In short, implementing DESTDIR this way is quite complicated, involving a lot of architecture-dependent tweaking. Is there a better, simpler way? For this particular problem, I think there is.

Modified basic commands

In most software, the “make install” command only uses a few simple commands to actually install the software. In my experience, the most common command by far is “install”, which is hardly surprising. Other common commands used in “make install” that might need redirecting from privileged directories (like /bin, /usr, and /etc) include cp, mkdir, ln, mv, touch, chmod, chown, ls, rm, and rmdir. It might also be useful to redirect “test”, though this is also a bash built-in (making its replacement more complicated) and I haven’t found any Makefiles where redirection of “test” is needed for “make install”. Programs that use libtool usually support DESTDIR directly, but even if they didn’t, the point is the same: “make install” tends to use only a very few programs.

So given that “make install” tends to only a few commands, one “obvious” approach would be to modify just these basic commands so that they will redirect their writes (e.g., if an environmental variable is set). Then the packager can just set and environmental variable and run “make install” This seems completely appropriate for “install”; the whole purpose is to perform installs, so adding functionality so it can do installs in a common case (creating packages) seems appropriate.

It’d be best if setting this up would be easy. I would suggest REDIR_DESTDIR as the environmental variable name. If it’s set, then writers are redirected to that as the root of the filesystem. For the “install” command, I think any use should be redirected (since by definition, all invocations are installs). With one exception: If install is invoked to install to inside REDIR_DESTDIR, then don’t re-prefix it; this avoids some awkward loops, and makes it easy for packagers to “automate” installation by always setting REDIR_DESTDIR. Other commands are only sometimes use for installation, but I suspect a simple detection method would be sufficient. For example, perhaps all attempts to write to a directory which only a privileged user/group (e.g., root) can write to would be redirected so that “/” becomes REDIR_DESTDIR. This way, “install xyz /bin” immediately becomes “install xyz ${REDIR_DESTDIR}/bin”, and similarly for “cp”, but “cp xyz .” doesn’t get redirected if the directory is local and writable by an ordinary user (such as a user creating the package). Don’t use “INSTALL_DESTDIR” as the environment variable name - it turns out this gets used by many installation makefiles, and would cause trouble instead of helping. This way, you don’t have to list which directories get redirected - temporary and local files aren’t redirected, while files getting installed will get redirected.

Obviously, if an attacker can control the environment variable of a root user, then the root user’s commands will get redirected. But if an attacker controls root’s environment, the system is already compromised (environment variables can already control the system in lethal ways in such cases). Never transition to root (from non-root) without removing all environment variables, and then adding in only the ones that you are certain are okay. This would be no different.

This approach is easy to apply (once the commands are changed), executes quickly, clearly works in a shared environment, and has no security issues. So there’s a lot going for it.

This approach only redirects those particular commands, and that is its fundamental weakness.  However, although a lot of “make install” routines recurse deeply and do complicated things in their source directories, I’ve found in a quick scan that most only use a few limited commands (like cp, install, and mkdir) to actually do the installing where this would matter, so I think this approach would be remarkably successful.  Unlike the LD_PRELOAD approach, this even works if programs like /bin/ln are installed statically.

Unfortunately, this requires changing some really low-level key programs like “cp” and “mkdir”. This is probably easily justified for install, since the whole purpose of “install” is to install programs, and packages are very common in today’s world. But changing “cp” and “mkdir” is no small matter; even if all agreed to it (and such agreement is rare), it’d take a long time to widely deploy (think of not only the many Linux distros, but also the *BSDs, Cygwin, etc.). So while this could be a long-term strategy, it’s not so great for the short term. Is there any way we can make things simpler? I believe there is, as discussed next.

Wrappers for basic install commands and a special PATH

As noted above, typically only a few commands in “make install” actually need to be redirected. We could simply modify the PATH environment variable so that its first directory is a “wrapper” directory. The wrapper directory would contain specialized “wrapped” versions of common commands that are used to install software when running in “make install”. These wrapped versions would then redirect the file-writing. As noted earlier, such commands might include: install, cp, mkdir, ln, mv, touch, chmod, chown, ls, rm, and rmdir. (As noted above, programs that use libtool usually support DESTDIR directly and thus don’t need help.)

This approach has the most of the same pluses and minuses as the previous approach, in particular, it only redirects those few commands. However, since those are the commands actually used by “make install”, in many cases this approach should be fine. One big additional positive, however, is that this can be done right away; no changing of fundamental programs is required. It requires no special privileges (and thus has no security impact), and running the wrappers can be quick (so there is practically no performance impact). The wrappers can be written in portable shell, which means that the wrappers can be really small and have no extra dependencies (so there would be no reason to avoid using them for installation).

One weakness of this approach is that a “make install” that invokes one of these commands using its fullpathname (e.g., /bin/cp or /bin/install instead of “cp” or  “install”) will not use the new redirecting command.  I’ve found that some makefiles set INSTALL as /bin/install, and it’s possible a few other programs are done that way too. However, in many cases these are trivially overridden by invoking the “make install” program as “make INSTALL=install CP=cp MV=mv install”, so this problem is typically easy to overcome.

This is a limited approach: It only redirects a few commands. But as long as “make install” routines use only a few commands to install programs — which seems to be the norm — then this approach is remarkably simple and effective. You could do worse than something that’s simple and effective. If you really need lots of programs redirected, you might be able to combine this with LD_PRELOAD based approaches; LD_PRELOAD works with many programs but tends to fail on the few that most matter (e.g., cp, mkdir, ln), so you can wrap the programs that LD_PRELOAD fails on, and let it pick up the rest.

Instead of fiddling with the PATH, it’d be possible to use chroot to use these special commands instead. However, that re-raises some of the complexity and security problems of chroot.

This is another trick that toast uses (as well as LD_PRELOAD).

I’ve implemented this approach (with a small dosage of the “make using special SHELL approach” described next). I implemented it using the bash shell. Most installations already have bash installed; if I’d used perl or Python, a user would have to install a much bigger program just to run it. C is terrible for string processing, so C would not be a great way to implement it either. If you’re interested, please go take a look at Auto-DESTDIR.

Make using special SHELL

Many make programs, such as GNU make, include the ability to set what shell to run. E.G., “make SHELL=... “ lets you override which shell to run. A special shell could then be used to override where the files go. By itself, this could be easily fooled; a number of install scripts call sub-scripts which then do the work. However, this might integrate very nicely with other approaches; it would make it possible to “catch” file redirections, for example, and override calls to “/usr/bin/install” and such. One problem is that this can easily lead to completely re-implementing the shell, which is a terrible idea.

Chroot

The “chroot” call is available everywhere.  A traditional chroot jail could be created so that “written” files aren’t written to the “real” system. This was one of the first approaches I thought of. Unfortunately, on most Linux-based systems it can be rather complicated to set up proper chroot environments. Calling chroot() is easy, but setting up the right environment to use it may involve either a large number of shared mounts that can’t be written to (a security concern) or a vast amount of file copying. Calling on chroot() requires root privileges, which distributions are loathe to give, and root privileges must be later dropped after the environment is set up (since root privileges can escape chroot() jails); this can make it difficult to integrate with other tools (such as many package recompilation tools).

FreeBSD automatically implements DESTDIR using chroot, but it isn’t entirely clear that how FreeBSD handles this is entirely desirable. FreeBSD’s approach stores package information in $DESTDIR/var/db/pkg, which in many cases is not where you wanted that information. I suspect that FreeBSD’s approach depends on features that other kernels (including Linux) do not have, e.g., it depends on mount_nullfs(1), which appears to make their approach (implemented in file bsd.destdir.mk) hard to move to more-popular systems. It’s interesting to note that implementing this was not easy; it took two tries to get this functionality working. (Here’s a web-accessible copy of bsd.destdir.mk implementing DESTDIR.) In any case, it is not at all clear that various Linux distributors are willing to use chroot to automate DESTDIR. See the kernel material (above) for some of the negatives of this.

RUST is “a toolkit for creating RPM packages to distribute software... [it] is both a drag & drop RPM creation GUI and a ‘sandboxing’ toolkit that allows you to do software installations within a chrooted environment and automatically generate RPMs from arbitrary source code, without ever seeing a spec file.” But since it doesn’t actually create .spec files, its results cannot be submitted to typical Linux repositories (like Fedora’s).

Other Sandboxes

There are other sandboxing approaches beyond chroot, such as User Mode Linux, that could be used.  But these appear really heavyweight for the purpose. Examples include:

Conclusions and related info

I’ve implemented the “Wrappers for basic install commands and a special PATH” approach above to automate DESTDIR. To get it, get the auto-DESTDIR package. As long as “make install” only uses a limited set of commands — which seems to be true in practice — this approach seems to solve the problem without requiring security issues or complicated reconfiguration of low-level infrastructure. It’s also very, very portable.

You might also find the user-union program useful, if you are trying to automate DESTDIR for existing programs. User-union creates union mounts, without requiring special privileges, and it can work with auto-DESTDIR.

Supporting DESTDIR is one of several good practices when releasing FLOSS software - if you release FLOSS software, you should follow those general community guidelines. If you don’t, at least make sure that Auto-DESTDIR can automate its support.