David A. Wheeler's Blog

Thu, 08 Jan 2009

Updating cheap websites with rsync+ssh

I’ve figured out how to run and update cheap, simple websites using rsync and ssh and Linux. I thought I’d share that info here, in case you want to copy my approach.

My site (www.dwheeler.com) is an intentionally simple website. It’s simply a bunch of directories with static files; those files may contain Javascript and animated GIF, but site visitors aren’t supposed to cause them to change. Programs to manage my site (other than the web server) are run before the files are sent to the server. Most of today’s sites can’t be run this way… but when you can do this, the site is much easier to secure and manage. It’s also really efficient (and thus fast). Even if you can’t run a whole site this way, if you can run a big part of it this way, you can save yourself a lot of security, management, and performance problems.

This means that I can make arbitrary changes to a local copy of the website, and then use rsync+ssh to upload just those changes. rsync is a wonderful program, originally created by Andrew Tridgell, that can copy a directory tree to and from remote directory trees, but only send the changes. The result is that rsync is a great bandwidth-saver.

This approach is easy to secure, too. Rsync uses ssh to create the connection, so people can’t normally snoop on the transfer, and redirecting DNS will be immediately noticed. If the website is compromised, just reset it and re-send a copy; as long as you retain a local copy, no data can be permanently lost. I’ve been doing this for years, and been happy with this approach.

On a full-capability hosting service, using rsync this is easy. Just install rsync on the remote system (typically using yum or apt-get), and run:

 rsync -a LOCALDIR REMOTENAME@REMOTESITE:REMOTEDIR

Unfortunately, at least some of the cheap hosting services available today don’t make this quite so easy. The cheapest hosting services are “shared” sites that share resources between many users without using full operating system or hardware virtualization. I’ve been looking at a lot of the cheap Linux web hosting services like these such as WebhostGIANT, Hostmonster, Hostgator, and Bluehost. It appears that at least some of these hosting companies improve their security by greatly limiting the access granted to you via the ssh/shell interface. I know that WebhostGIANT is an example, but I believe there are many such examples. So, even if you have ssh access on a Linux system, you may only get a few commands you can run like “mv” and “cp” (and not “tar” or “rsync”). You could always ask the hosting company to install programs, but they’re often reluctant to add new ones. But… it turns out that you can use rsync and other such services, without asking them to install anything, at least in some cases. I’m looking for new hosting providers, and realized (1) I can still use this approach without asking them to install anything, but (2) it requires some technical “magic” that others might not know. So, here’s how to do this, in case this information/example helps others.

Warning: Complicated technical info ahead.

I needed to install some executables, and rather than recompiling my own, I grabbed pre-compiled executables. To do this, I found out the Linux distribution used by the hosting service (in the case of WebhostGIANT, it’s CentOS 5, so all my examples will be RPM-based). On my local Fedora Linux machine I downloaded the DVD “.iso” image of that distro, and did a “loopback mount” as root so that I could directly view its contents:

 cd /var/www     # Or wherever you want to put the ISO.
 wget ...mirror location.../CentOS-5.2-i386-bin-DVD.iso
 mkdir /mnt/centos-5.2
 mount CentOS-5.2-i386-bin-DVD.iso /mnt/centos-5.2 -o loop
 # Get ready to extract some stuff from the ISO.
 cd
 mkdir mytemp
 cd mytemp

Now let’s say I want the program “nice”. On a CentOS or Fedora machine you can determine the package that “nice” is in using this command:

 rpm -qif `which nice`
Which will show that nice is in the “coreutils” package. You can extract “nice” from its package by doing this:
 rpm2cpio /mnt/centos-5.2/CentOS/coreutils-5.97-14.el5.i386.rpm | \
   cpio --extract --make-directories
Now you can copy it to your remote site. Presuming that you want the program to go into the remote directory “/private/”, you can do this:
 scp -p ./usr/bin/rsync MY_USERID@MY_REMOTE_SITE:/private/

Now you can run /private/nice, and it works as you’d expect. But what about rsync? Well, when you try to do this with rsync and run it, it will complain with an error message. The error message says that rsync can’t find another library (libpopt in this case). The issue is that and cheap web hosting services often don’t provide a lot of libraries, and they won’t let you install new libraries in the “normal” places. Are we out of luck? Not at all! We could just recompile the program statically, so that the library is embedded in the file, but we don’t even have to do that. We just need to upload the needed library to a different place, and tell the remote site where to find the library. It turns out that the program “/lib/ld-linux.so” has an option called “—library-path” that is specially designed for this purpose. ld-linux.so is the loader (the “program for running programs”), which you don’t normally invoke directly, but if you need to add library paths, it’s a reasonable way to do it. (Another way is to use LD_LIBRARY_PATH, but that requires that the string be interpreted by a shell, which doesn’t always happen.) So, here’s what I did (more or less).

First, I extracted the rsync program and necessary library (popt) on the local system, and copied them to the remote system (to “/private”, again):

 rpm2cpio /mnt/centos-5.2/CentOS/rsync-2.6.8-3.1.i386.rpm | \
   cpio --extract --make-directories
 # rsync requires popt:
 rpm2cpio /mnt/centos-5.2/CentOS/popt-1.10.2-48.el5.i386.rpm | \
   cpio --extract --make-directories
 scp -p ./usr/bin/rsync ./usr/lib/libpopt.so.0.0.0 \
        MY_USERID@MY_REMOTE_SITE:/private/
Then, I logged into the remote system using ssh, and added symbolic links as required by the normal Unix/Linux library conventions:
 ssh MY_USERID@MY_REMOTE_SITE
 cd /private
 ln -s libpopt.so.0.0.0 libpopt.so 
 ln -s libpopt.so.0.0.0 libpopt.so.0

Now we’re ready to use rsync! The trick is to tell the local rsync where the remote rsync is, using “—rsync-path”. That option’s contents must invoke ld-linux.so to tell the remote system where the additional library path (for libopt) is. So here’s an example, which copies files from the directory LOCAL_HTTPD to the directory REMOTE_HTTPDIR:

rsync -a \
 --rsync-path="/lib/ld-linux.so.2 --library-path /private /private/rsync" \
 LOCAL_HTTPDIR REMOTENAME@REMOTESITE:REMOTE_HTTPDIR

There are a few ways we can make this nicer for everyday production use. If the remote server is a cheap shared system, we want to be very kind on its CPU and bandwidth use (or we’ll get thrown off it!). The “nice” command (installed by the steps above) will reduce CPU use on the remote web server when running rsync. There are several rsync options that can help, too. The “—bwlimit=KBPS” option will limit the bandwidth used. The “—fuzzy” option will reduce bandwidth use if there’s a similar file already on the remote side. The “—delete” option is probably a good idea; this means that files deleted locally are also deleted remotely. I also suggest “—update” (this will avoid updating remote files if they have a newer timestamp) and “—progress” (so you can see what’s happening). Rsync is able to copy hard links (using “-H”), but that takes more CPU power; I suggest using symbolic links and then not invoking that option. You can enable compression too, but that’s a trade-off; compression will decrease bandwidth but increase CPU use. So our final command looks like this:

rsync -a --bwlimit=100 --fuzzy --delete --update --progress \
 --rsync-path="/private/nice /lib/ld-linux.so.2 --library-path /private /private/rsync" \
 LOCAL_HTTPDIR REMOTENAME@REMOTESITE:REMOTE_HTTPDIR

Voila! Store that script in some easily-run place. Now you can easily update your website locally and push it to the actual webserver, even on a cheap hosting service, with very little bandwidth and CPU use. That’s a win-win for everyone.

path: /misc | Current Weblog | permanent link to this entry

Moving hosting service at end of January 2009

I will be moving to a new hosting service at the end of January 2009. (I haven’t determined which hosting service yet.) In theory, there should be very little downtime, but it’s possible the site will be off for a little while. But if that happens, it will be very temporary - I’ll get the site back up as soon as I can.

path: /website | Current Weblog | permanent link to this entry