Getting started with rsync - Comprehensive Guide

CaffeineFueled

2024/01/14

rsync is a CLI tool that covers various use cases. Transfering data, creating backups or archives, mirroring data sets, integrity checks, and many more.

Reference for this article: rsync version 3.2.7 and two Ubuntu 22.04 LTS machines.

If you want to transfer files to a remote host, rsync must be installed on both sites, and a connection via SSH must be possible.

Side note: rsync can be used via rsh or as a daemon/server over TCP873, but I won’t cover those in this article and concentrate on the transfer over SSH

Basic File transfer #

You can transfer files locally, from local to a remote host or from a remote host to your local machine. Unfortunately, you can’t transfer files from one remote host to another remote host.

The used syntax for rsync is the following:

rsync [options] source destination

The syntax for the remote-shell connection is:
[user@]host:/path/to/data
Example of transferring a directory to a remote host:
rsync ./data user@192.0.2.55:/home/user/dir/

Side note: ./data/ copies only the files in the directory, ./data copies the directory as well.

Rsync does not update or preserve metadata like ownership or timestamps of an item. More information about this is in the metadata section below.


It is highly recommended to use various options with rsync to get the results you want.

Common options:
--recursive/-r # copy directories recursively
--ipv4/-4 # use Ipv4
--ipv6/-6 # use Ipv6
--human-readable/-h # output numbers in a human-readable format
--quiet/-q # decreases the output, recommended for automation like cron
--verbose/-v # increase the output of information
--archive/-a # rescursive + keeps all the meta data. Further information in the ‘metadata’ section

Specify a different SSH port #

The default TCP port for SSH is 22 but some servers listen on another port. That is not a problem, and you can tell rsync to connect to another port:

-e "ssh -p 2222" # connection to TCP2222 instead of TCP22

Mirroring data #

You can simply mirror a directory to or from a remote host with the --delete option. Rsync compares the source and destination directories, and if it finds files in the destination directory that are missing in the source, it will delete those to keep both sides the same. Please use it with caution and start with a dry run.

kuser@pleasejustwork:~/9_temp/rsync$ rsync -ah --delete --itemize-changes ./data user@192.0.2.55:/home/user/
sending incremental file list
*deleting   data/small-files-7
[...]

Deleting source files after transfer #

The option --remove-source-files - as the name already implies - removes all data after transferring the data to the destination. Please use it with caution and start with a dry run.

Update-behaviour #

There are some options to make sure that rsync does not overwrite data on the destination.

Examples:
--update/-u # don’t update files that are newer on the destination
--existing # don’t create new files on the destination
--ignore-existing # don’t update files that exist on the destination
--size-only # only update when the size changes, but not the timestamp

This can be helpful when the files are used or modified by another application and you don’t want to overwrite anything.

Item Metadata #

As mentioned before, rsync does not preserve the media data of a file or directory. You can set various options to decide what you keep.

Your options:
--perms / -p # permissions
--owner / -o # owner
--group / -g # group
--times / -t # modification time
--atimes / -U # access time
--crtimes / -N # create time
-A # ACLs
-X # extended attributes

One of the most common options is --archive/-a, which will preserve all metadata and add recursing. It is in fact a shortcut for -rlptgoD. It additionally preserves symbolic links, special and device files.

You can use the --no-* syntax to remove single attributes like --no-perms.

Exclude directories and files #

Rsync makes it easy to exclude files and directories. I’ll show you some examples in the following list.

Example:
--exclude "*.iso" --exclude "*.img"
--exclude={"/tmp/*", "/etc/*"}

You can use --exclude-from= to reference a file with a list of exclusions to make it more manageable.

Every line is one exclusion and line starting with ; or # are interpreted as commend and are getting ignored:

--exclude-from='/exclude.txt'

$ cat exclude.txt

.git
*.iso

# Temp
/tmp
/cache

Exclusion by file size

You can exclude files from being transferred if they’re too small or too large with --max-size= and --min-size=:

Examples:
--max-size=500m # max file size of 500 Mibibyte
--min-size=5kb # min file size of 5 kilobyte
Common scheme:
b byte
k kilo/kibi
m mega/mebi
g giga/gibi
t tera/tebi
p peta/pebi

Single letter or three letters ending with ib like kib tells rsync to use the Binary Prefix (multiplied by 1024) - kibibytes, and two letters like kb tells rsync to use the Decimal Prefix (multiplied by 1000) - kilobytes.

Limit transfer bandwidth #

Sometimes, it is necessary to limit the transfer speed of rsync. You can do it with --bwlimit=, which uses KB/s by default.

Some examples:
--bwlimit=100 # Limits bandwidth to 100 KB/s
--bwlimit=250k # Limits bandwidth to 250 KB/s
--bwlimit=1m # Limits bandwidth to 1 MB/s

Data Compression #

You can choose to compress your data transfer which is great for slow connections. You can choose to activate compression with --compress/-z and rsync will choose a method for you if you do not specify a method that is compatible with the server side.

You can check the available algorithms with rsync --version:

$ rsync --version

rsync  version 3.2.7  protocol version 31
[...]
Compress list:
    zstd lz4 zlibx zlib none
[...]

You can choose the compression algorithm with ----compress-choice=/--zc=.

Besides the algorithm, you can choose the compression level with --compress-level=/--zl=. Every algorithm has its own list of levels, and it is recommended to look them up.

Side note: you can choose --zl=999999999 to get the maximum compression no matter what algorithm you choose as rsync limits this value silently to the max limit.

Showing Transfer Progress #

By default, rsync does not show any progress at all.

$ rsync -ah ./data user@192.0.2.55:/home/user/ > nothing

With -v you get a more verbose output and show at least the file that rsync is transferring at the moment:

$ rsync -avh --delete   ./data user@192.0.2.55:/home/user/
sending incremental file list
data/
data/big-file
[...]

With --progress you get the progress and transfer speed per file:

$ rsync -ah --progress ./data user@192.0.2.55:/home/user/
sending incremental file list
data/
data/big-file
         17,92M   1%    2,59MB/s    0:06:27

To see only the total progress, use --info=progress2:

$ rsync -ah --info=progress2 ./data user@192.0.2.55:/home/user/
          4,42M   0%    4,07MB/s    0:04:10

The number behind progress is the verbosity level: 0=no output; 1=per file; 2=total.

This progress is better than nothing, but it can be vague as rsync is still checking the rest of the files for changes. With --no-inc-recursive/--no-i-r you can tell rsync to create the file list first and then start the transfer to make it more precise. That said, it delays the initial transfer.


You can use --stats to get the transfer results at the end of the transfer.

Start a dry run #

Side note: the following method can be used to perform an integraty check. For example, you used another tool to transfer a large data set, and you want to check if everything was transferred right. You can double-check it with rsync and even correct things.

Depending on your use-case, there is a chance to delete data by making mistakes. To avoid that, we can use two features to check the steps rsync will perform in a secure way.

I am talking about --dry-run/-n and --itemize-changes/-i. The former performs a read-only run, and the latter shows you all the changes rsync will perform.

Let me show you an example, and don’t worry about the other options for now:

kuser@pleasejustwork:~/9_temp/rsync$ rsync -ah --delete --itemize-changes --dry-run ./data user@192.0.2.55:/home/user/
sending incremental file list
*deleting   data/small-files-7
.d..t...... data/
              0   0%    0,00kB/s    0:00:00 (xfr#0, to-chk=0/32)
<f.st...... data/small-files-1
<f+++++++++ data/small-files-14
cd+++++++++ data/new-data/
              5   0%    4,88kB/s    0:00:00 (xfr#5, to-chk=0/32)

sent 583 bytes  received 61 bytes  429,33 bytes/sec
total size is 1,05G  speedup is 1.628.223,61 (DRY RUN)
Explanation for this example of --itemize-changes:
*deleting data/small-files-7 # deletes file on destination
.d..t...... data/ # timestamp of directory data changed
<f.st...... data/small-files-1 # changing size and timestamp on destination of file small-files-1
<f+++++++++ data/small-files-14 # file will be created in destination
cd+++++++++ data/new-data/ # new directory in source detected; will be created on destination
The syntax of this string is YXcstpoguax and is explained as follows:
Y # type of update performed
X # is the file type
cstpoguax # are the attributes that could be modified
Explanation of update types Y:
< # file is being SENT
> # file is being RECEIVED
c # local change or creation of an item (directory, sym-link, etc)
h # item is a hard link
. # item is not getting updated
* # the rest of the output contains a message (e.g. deleting)
Explanation of file types X:
f # stands for file
d # stands for directory
L # stands for sym-link
D # stands for device
S # stands for ‘special’, e.g. named sockets
Explanation for the attributes cstpoguax of an item:
c # checksum
s # size
t # timestamp
p # permissions
o # owner
g # group
u | n | b # a = access time ; n = create time ; b = both, access and create times
a # ACL information
x # extended attributes
Explanation of the status of the attribute:
A letter means the attribute is being updated
. # attribute unchanged
+ # item newly created
? # change is unknown, working with old rsync versions

Transfer Logging #

Rsync does not log anything by default. There are multiple ways to do so.

You can create a log file with --log-file=:

$ rsync -ah --info=progress2 --log-file=./rsync.log ./data user@192.0.2.55:/home/user/
         29,43M   2%    3,13MB/s    0:05:18
         [...]

and the logs would look like this:

$ cat rsync.log 
2024/01/14 18:24:26 [647220] building file list
2024/01/14 18:24:26 [647220] cd+++++++++ data/
2024/01/14 18:24:34 [647220] sent 29630071 bytes  received 585 bytes  total size 1048576005
[...]

You can modify the name of the logs, for example, by adding a timestamp. That is great for automation like daily cron jobs.

$ rsync -ah --info=progress2 --log-file=./rsync-`date +"%F-%I%p"`.log ./data user@192.0.2.55:/home/user/
         32,28M   3%    4,84MB/s    0:03:24  ^C
[...]

$ ll
[...]
-rw-r--r--  1 user user  577 Jan 14 18:31 log-2024-01-14-06.log
[...]

Another option is to save your console output to a log file like this:

rsync command >> ./rsync.log

This is a quick and dirty version.


Rsync provides a large set of logging options and lets us decide what to show and hide. As it is out of the scope of this article, I won’t go into detail, but I wanted to share the --info=help output to give you an idea of the options.

$ rsync --info=help
Use OPT or OPT1 for level 1 output, OPT2 for level 2, etc.; OPT0 silences.

BACKUP     Mention files backed up
COPY       Mention files copied locally on the receiving side
DEL        Mention deletions on the receiving side
FLIST      Mention file-list receiving/sending (levels 1-2)
MISC       Mention miscellaneous information (levels 1-2)
MOUNT      Mention mounts that were found or skipped
NAME       Mention 1) updated file/dir names, 2) unchanged names
NONREG     Mention skipped non-regular files (default 1, 0 disables)
PROGRESS   Mention 1) per-file progress or 2) total transfer progress
REMOVE     Mention files removed on the sending side
SKIP       Mention files skipped due to transfer overrides (levels 1-2)
STATS      Mention statistics at end of run (levels 1-3)
SYMSAFE    Mention symlinks that are unsafe

ALL        Set all --info options (e.g. all4)
NONE       Silence all --info options (same as all0)
HELP       Output this help message

Options added at each level of verbosity:
0) NONREG
1) COPY,DEL,FLIST,MISC,NAME,STATS,SYMSAFE
2) BACKUP,MISC2,MOUNT,NAME2,REMOVE,SKIP



Most recent Articles:
  • Notice Board 003: Progress is Progress
  • How to: Cisco ISE backup to SFTP repository with public key authentication
  • Dummy IP & MAC Addresses for Documentation & Sanitization
  • Deploying ISSO Commenting System for Static Content using Docker
  • Generate a Vanity v3 Hidden Service Onion Address with mkp224o