Using rsync To Backup Files - The Basics

When you have a large file store it can become a real burden to regularly backup the files. Take a website with a podcast on it. This site can easily grow to a couple gigs in size. Backing up those files can take a lot of time. If you use a standard ftp client that can take you a couple hours to download. A couple hours is not something practical to do on a daily basis. Or, maybe you have a shared file server. A shared file server can quickly grow in size. Performing regular backups of that can take a lot of time and resources. This is where rsync shines and can make the job much easier.

Rsync is a unix based file synchronization program (but there are Windows ports). What this means is that it can easily create file backups that are exactly the same as what you have in your directory structure. It can even preserve permissions and time stamps on the files.

Rsync works by looking at the source file system, comparing it to the destination location, and only moving over what has changed. This makes regular backups a much smaller task. On top of this, rsync can use compression to speed up transfers and with some file types it can grab only the parts of a file that changed.

Rsync is included in most Linux distributions, FreeBSD, and OS X so you might already have it on your system. If you don't head over to the rsync site to download and install it.

Lets look at the basic command structure. First, fire up a terminal window. Here is a basic command:

$ rsync -avz -e ssh remoteuser@remotehost:/remote/dir /this/dir/

Let's look at the different parts of this command. The -avz selectors are telling it to do some very specific things when it runs. The a is telling it to run in archive mode. Rsync is often used in environments where it isn't backing up files. This tells rsync that it's acting in an archive capacity. The v is for verbose. This will display a list of each of the files it backs up as it backs them up. The z is telling it to use compression.

Next we have -e ssh. This is telling it to use rsh replacement. In this case we are backing up the files via ssh which is encrypted for security.

remoteuser@remotehost:/remote/dir is the source location. In this case we are connecting to another computer via ssh. The remoteuser@remotehost is the information for the account to connect to the remote server. The /remote/dir is the directory you want to backup. All subdirectories will, also, be backed up.

Finally, the /this/dir/ is the directory on the current system you want to backup the files to.

Rsync is flexible and has many optional commands. For example, I recently created a backup of a site minus it's 2 gigs of mp3 files. To do this I ran the rsync command:

$ rsync -avz -exclude '*.mp3' -e ssh remoteuser@remotehost:/remote/dir /this/dir/

This told rsync to exclude all files with the extension mp3.

If you're thinking this would be a pain to run on a regular basis you don't need to do that. A simple shell script run via a cron job can automatically execute something like this on a daily basis and take care or your file back ups. Too many strange words in the last sentence? If you don't know, a cron job is just a regularly scheduled task and a shell script is just like a little program that runs from the command line. Cron tasks can run command line scripts.

Oh I almost forgot, Combining rsync with samba (for connections to windows computers) can make a great shared drive backup utility.

If you're looking for a good backup utility check out rsync. It can definitely provide a powerful file back tool. Over the next couple months I'll blog more about using rsync as part of a full package backup solution.

Great post!

Just wanted to expand on your comment about rsync being available for windows. I'm using the cwRsync package which is REALLY easy to install and setup on Windows. I highly recommend it.
Link

Thanks for the link

Travis - Thanks for the link.

I'm in the process of setting up a solid backup system for a number of important (to me) sites. Rsync has turned out to be a great tool for that.

rsync still?

Hey I was curious if you're still using rsync regularly? And if so, how frequently?

Also, G & G show idea based on the past few weeks, "Switching Webhosts," "living green," and especially the Web 2.0 services show, it's time for an episode called something like:

FOG FREE LIFE SYNCING: the various tools out there for syncing calendars, contacts, files, etc. make it a bit overwhelming for full-time pastors.

Finally, in this episode, I'd love your opinions on the "new" MobileMe that apple unveiled today. From what I understand, most of these third party tools for syncing back and forth between Gmail, Google Calendars, iCal, Outlook, CRM's, etc. won't be needed, especially since Apples using Push technology. So, though I have let's say, SpanningSync or BusySync on my Mac, it still relies on me to set it to run auto or manually which means I have to fire the app. Doesn't PUSH eliminate the need for that?

Thanks dude!

Bart

I like the idea

I like the syncing idea. We've been mulling around technologies that make our lives easier and that would fit in great there.

As for rsync, I use it at least once a week for a regular backup. I should set it up to run more. Mostly, it runs in the background as a regularly scheduled task so it's a no brainer. I, also, run it as a last second backup before I make any major changes.