Online Backups for the Truly Paranoid

November 28, 2009

I like paranoia in design. Well, I take that back. I don't like it when it inhibits programming experimentation and creativity, but I do like it when it comes to services, and most especially when it comes to backup.

I wanted to write about my experiences with consumer offsite backup services (e.g. Mozy, Carbonite, Jungle Disk) as well as the plain practice of having a redundant storage device onsite. But all that was side-tracked when I recently needed to quickly backup my servers, and discovered tarsnap.

Tarsnap was created by Dr. Colin Percival, the FreeBSD Security Officer. He also worked on the utilities portsnap and freebsd-update. All of these tools are run in the command-line, and greatly simplify the maintenance (and now backup) of unix systems.

The things I like about tarsnap are encompassed in its listed design features:

  • It encrypts information before sending it to the Amazon Cloud (AWS), so if a person somehow gets access to the cloud servers, the information is unreadable. Even metadata is unreadable (filenames, sizes, names of the backups, etc.).
  • It's easy to learn and use, and quite scriptable.
  • It breaks your backup into variable-length blocks, and keeps track of these, so if another archive contains the same data, that same block does not get re-uploaded. As long as any backup references that piece of info, it'll remain stored and not be deleted. It's like storing incremental changes, but so much cooler.
  • It's quite cheap. Especially if used for server backups, which typically won't take terabytes of space. 300 picodollars per byte transferred ($0.30/GB), and 300 picodollars per byte per month stored (again, $0.30/GB-month).

Also, other than "security, flexibility, efficiency, utility", I personally liked that:

  • The client code is open to peer review.
  • It uses AWS! Geo-replication concerns are no longer a problem.
  • It runs on almost any OS, yes, even Cygwin.
  • You can secure the backup keys so that, if a person breaks into your system and starts deleting everything, they cannot also read or delete your backed up data. Even the backup key security is fascinating!
  • Not to mention the author was surprisingly responsive via email to some questions I had about the web-based reporting and command-line options.

The tool has basically addressed almost every concern I've had about backups.

Most early "backup service" providers would simply give you space at a cost, but had very little to say about their data loss/breach liability or who had access to their systems. Others would claim their service is "secure using 128-bit encryption" but that only meant they installed an SSL cert for transfer; backups were still unencrypted on disk. Then there are those who tout The Cloud, and how much safer it is, without a hint about data geo-redundancy (or if they have more than one data center).

But with tarsnap, I just install it, create a key, split the keys to read, write, and delete keys (encrypting the read and delete keys), and with a command I'm securely backing up entire directories. Online. In the Amazon™ cloud.

tarsnap --keyfile /usr/tarsnap.key -c -f backup-2009-11-27 /usr/home /usr/local/etc /etc

How much easier could that be? And if your backups aren't gigabytes large, the small pre-payment of the online service could last a very long time.

My only concern is that the tarsnap server is, as of this time, a single point of failure. We have no option of having our own tarsnap interface to our own personal AWS accounts. So 1) we have to trust that it is indeed being sent to AWS by Dr. Percival (is that too paranoid?), and 2) we have to hope that the tarsnap server is fault tolerant and can be restored quickly. Granted, this problem exists for any online backup service unless you write your own. We depend on third-party uptime for any service, so it boils down to who's thought it through, and has addressed our backup concerns.

For now, I am glad "production" isn't the only place some important data lives. I am glad to not have to manually tar.gz files and move them to my workstation to be picked up by my desktop backup scheduler. With tarsnap, I was able to upgrade from FreeBSD 7.2 to 8.0-RELEASE and not worry (too much) about having to rebuild the server in case all failed. (I didn't need to.)

Online backup for the truly paranoid. Who backs stuff up who isn't paranoid?