Backup using rsync and hard links
What is this?
backup.sh is the backup script I once put together, and which does its job quite well for my needs. It uses rsync for creating incremental backups using hard links.
- Incremental backups: never lose any work
- Storage space: only store changed files
- Immediate access to every backup date, using /DATE/path/to/file
- Easy deletion of backups, without affecting future or past backup points
- KISS: Easy to understand, and no uncommon dependencies
- Probably works across networks
- Heavy usage of hard links might confuse some user space tools
- No deduplication
- Small changes in big files cause the whole file to be stored
- CPU-intensive migration to a new backup medium (keeping hardlinks together has quadratic runtime).
It's as easy as adapting the SRC and DEST variables in the script header to your needs. Then drop a .backup.exclude (note the leading dot) file (which may be empty) into the SRC directory, listing all the files to exclude.
To exclude all 'foo'-named files, add a 'foo' line. To exclude only 'SRC/foo' (but not 'SRC/subdir/foo'), then add a '/foo' line. For more details, consult the rsync man page.
Back it up!
Just fire up the script, lean back and enjoy the names of files scrolling down your screen.
If you want to free some space, just delete the backup points you like with
rm -rf DEST/20140215 DEST/20141210 ...
How it works
The scripts rsyncs your directory contents to backups/DATE/. The --hard-links rsync option tells rsync to compare the files with those in backups/LAST_DATE/. If the files differ, then it copies the file to backups/DATE/file, if they're the same, it hardlinks backups/DATE/file to backups/LAST_DATE/file.
The .backup.exclude file in the source directory root is passed to rsync for excluding unwanted files from being backed up.
You can comment in or out the lines that calculate the size of this backup, or the size of all backups taken. While I recommend watching these sizes after your first 5-or-so backups in order to ensure that your .backup.exclude is correct, I have commented them out after some backups; they just slowed things down.
If you want multiple backups per day, you can easily adapt the naming scheme to include hours/minutes as well. Just make sure that alphanumerical sorting stays consistent with time (i.e., don't use DD.MM.YYYY, but YYYY-MM-DD).