Synchronize Your Data with rsync

Rsync is the perfect synchronization tool for keeping your data in sync. The program manages file properties and uses SSH to encrypt your data, and it is perfect for transferring large volumes of data if the target computer has a copy of a previous version. Rsync checks for differences between the source and target versions. The tool that has been developed by the Samba team uses an efficient checksum-search algorithm for comparing data; rsync only transfers the differences between the two sides and therefore saves time and bandwidth.

In Sync

The generic syntax for rsync is rsync [options] source target, where target can be a local target on the same machine or a remote target on another machine. The choice of source and target is critical; decide carefully in which direction you will by synchronizing to avoid loss of data. If you’re not sure that you’re using the correct options or the correct source/​target, you can run rsync with the ‑n flag to tell the program to perform a trial run. Additionally, you can increase the amount of information by defining ‑v and switching to verbose output.

To mirror a directory dir1 on a local machine, for example, type:

$ rsync dir1/* dir2/
skipping directory foo
skipping directory bar
skipping non‑regular file "text.txt"

As the output shows, rsync would transfer normal files but leave out subdirectories and symbolic links (non‑regular file). To transfer directories recursively down to the lowest level, you should specify the ‑r option. Using the ‑l flag additionally picks up your symlinks. Of course, a combination of the options is also possible:

rsync ‑lr dir1/* dir2/

Rsync has an alternative approach to handling symlinks. If you replace ‑l with ‑L, the program will resolve the link, and your former symlinks will end up as “normal” files at the target.

Be careful with the slash – appending a slash to a directory name influences the way rsync handles an operation (see the “Common Rsync Traps” box).

Common Rsync Traps

Some rsync options could cause trouble if you don’t use them with caution. Being aware of these common mistakes can help.

  • Most users find the final slash for directories confusing at first. For example, if you call rsync ‑a source/folder target/, rsync will transfer the directory called folder and its contents to the target directory. If the directory folder doesn’t exist, rsync will create it. If you append a slash to source/folder/, rsync will only transfer the contents of folder. That means a file source/folder/foo.txt is being transferred to target/foo.txt instead of target/folder/foo.txt.
  • An absolute classic troublemaker is the option ‑ ‑delete. If you get source and target mixed up, ‑ ‑delete will happily delete several original files. To be on the safe side, remember to use ‑n in a test run.
  • If a transfer is interrupted and you’re using the ‑ ‑partial flag, rsync saves parts of the file under the same name as the original, which is not always helpful. Imagine that you’re using rsync to update a large and existing ISO image of your favorite distribution (like a Release Candidate). The transfer of the new version gets interrupted after just a few bytes. Rsync will overwrite your original file with the smaller part of the ISO image from the server, and you’ll have lost your current file and have to start from scratch.

To avoid loss of data in this scenario, you can create a hard link before calling rsync. If the transfer fails now, you won’t lose the ISO image; instead, the partial file will be given a new name without destroying the original.

 

As You Were

If you will be using rsync to create backups, it makes sense to keep the attributes of the original files. By attributes I mean permissions (read, write, execute, see the “Access Permissions” article) and timestamps – that is, information on the last access time (atime), the last status change (ctime), and the last modification (mtime).

Additionally, administrators can benefit from parameters that preserve owner and group data and support device files. To retain the permissions, just specify the ‑p option; ‑t handles the timestamps, and ‑g keeps the group membership.

Whereas any normal user can specify these parameters, the ‑o (keep the owner data) and ‑D (device attributes) flags are available only to root. The complete command line with all these options could look like this:

rsync ‑rlptgoD /home/huhn backup/

Don’t worry – you don’t have to remember all these options. Rsync offers a practical shortcut and a special option that combines these parameters for this case. Instead of ‑rlptgoD, just type ‑a.

Exclusive

Rsync has another practical option that allows you to exclude certain files from the synchronization process. To leverage this feature, specify the ‑ ‑exclude= option and a search pattern and define the files to exclude. With this option, you can use wildcards:

rsync ‑a ‑‑exclude=*.wav ~/music backup/

This example excludes large WAV files that end in .wav from the backup of a music collection. If you need to exclude MP3s as well, just append another exclude statement and a pattern:

rsync ‑a ‑‑exclude=*.wav ‑‑exclude=*.mp3 ...

To save time, you can store your exclusions in a text file. To do this, you will need a separate line for each search pattern. Specify the ‑ ‑exclude‑from=file_with_exclusions parameter to parse the file.

Tidying Up

Rsync offers various parameters for deleting data that is no longer needed or wanted. To get rid of files in your backup that no longer exist in the source, type ‑ ‑delete. Rsync’s default behavior is to delete files before the transfer is finished. Alternatively, you can define ‑ ‑delete‑after to delete files of the target after all the syncing is done.

Additionally, you can tell rsync to delete files that you have excluded (see the previous section). For example, imagine you’ve decided that you no longer want the MP3s in the backup and you’ve started to exclude them with ‑ ‑exclude=*.mp3. Now you can define ‑ ‑delete‑excluded, and rsync will recognize that those files are no longer wanted.

All ‑ ‑delete options have basically the same goal: to keep an exact copy of the original. If you don’t use the switch, you will have to clean up manually; otherwise, the files that you’ve decided are useless will remain. Use these options with care (see the “Common Rsync Traps” box).

Tuning Rsync

Several options increase rsync’s performance. Often, I use the ‑z switch to compress data when I sync data over a network connection. Figure 1 shows this using Grsync, the graphical front end to rsync. If the connection is very slow, you can also define a bandwidth limit. To transfer data with only 20KBps, for example, use:

rsync ... ‑‑bwlimit=20
Figure 1
Figure 1: The rsync ‑z option to compress data is shown in the Grsync graphical front end.

 

Rsync is perfect for transferring large volumes of data. If you specify the ‑ ‑partial parameter and the transfer is interrupted for some reason, you can pick up the transfer from the point at which you left off. Specifying the ‑ ‑progress option gives you a progress indicator to let you keep track of the transfer operation:

rsync ‑avz ‑‑progress ‑‑partial remote.server:/home/huhn/music/folk ~/music/
receiving file list ...
42 files to consider
...
12_Moladh_Uibhist.mp3
   1143849   4%  339.84kB/s    0:01:10

At the other end of the connection, the partial file is hidden in the target directory at first. Typing ls ‑a reveals a file called 12_Moladh_Uibhist.mp3.7rUSSq. The dot at the start of the file name keeps the file hidden, and the arbitrary extension removes the danger of overwriting existing files.

When the transfer completes, the file gets its original name back. If the transfer is interrupted, you can restart by specifying the ‑ ‑partial option again. Alternatively, you have a shortcut: If you want to use a combination of ‑ ‑partial and ‑ ‑progress, simply use ‑P. For the downside of using the ‑ ‑partial flag, again see the “Common Rsync Traps” box.

Rsync keeps your data up to date and helps you stay on top of confusing version changes. Its options help you manage file properties, and it works well with SSH. When you need to transfer large volumes of data, rsync comes to your rescue.

This article originally appeared in the Linux Shell Handbook and is reprinted here with permission.

Want to read more? Check out the latest edition of Linux Shell Handbook.

Contact FOSSlife to learn about partnership and sponsorship opportunities.

FOSSlife Newsetter

Comments