diyAudio logo

LINUX Audio BACKUP


Rev 1.00-ALPHA: 5/15/2009 soundcheck: Initial description


WORK IN PROGRESS!!!!!!!!!!!!!!! Handle with care!!!!!!!!!!

I. INTRODUCTION

The poor backup handling of very valuable data is probably one of the most critical issues in the daily growing computer audio scene.
Many people have spent hundreds of hours to get their collection on disk, shape the tags and filenames, collect and produce
the albumarts. But how do they make sure that they don't loose these valuable data?
My guess, only a few folks have an idea what they are doing.
As a matter of fact, many of these people do not have even a single reliable backup available.
Some people believe that their basic raid system setup is sufficiant - it's not - it won't normally protect you from loosing your data.

I think the main-reason for this is that not very many people spend half a day to sit down and develop their own personalized
backup strategy.
This is understandable. It took decades in the IT industry to develop satisfying backup and restore strategies. How can one expect that a hobbiest is
able to cope with this.

However, once you looked into it, you'll realize that the whole backup procedure is not a big deal - not at all.
Once you have settled and automated it, it won't bother you anymore.

Backing-up and restoring a system by running the later described methods will take just a couple of minutes.

Rebuilding everything from scratch will take from half a day up to several weeks (in case you lost your collection).

I would like to show you how to backup the system quite easily, there are several ways of doing it.

I'll describe "my way" of doing it. I am not the only one in the Linux community, who likes to know what the used tools are doing.
I'll show you how to do it manually, from the commandline, very easily.
Of course there'll be several more options, beside the ones I am describing e.g. doing backups over networks, compressing data etc.
To make things not too complicated for now I'll stick with the IMO most important ones below.


NOTE: I don't take any responsibility in case you mess up your system or any data get lost by following below recommendations.
Make sure that you understand what you're doing by running below or simlar commands.
It's always good to do some dry-runs or fake-backups before your start saving your real data.

Prerequites:

1. Backup disk hardware: Buy yourself a quality 1TB USB disk - they run at 70-80$ nowadays.

2. Backup/Recovery system - a bootable DVD or USB stick - with all system tools already integrated.
The Knoppix Live-CD has proven over years to be a system of choice for this task. It has a lot of admin tools on board by default.
However, burning a Linux Mint ISO with e.g. "unetbootin" on a USB drive will also be suffciant and easy to accomplish.


II. BACKUP METHODS

You can run your backups mainly in two ways - file based and/or block based. File based backups you can do while the system is up'n running and block
based should be done on an unmounted drive or partition. The below described block based backup is the almost 100% safe backup method.


Lets get started.


NOTE: One of the key questions will be "What are the device ids of the drives resp. partitions (source vs. target) ?" . Switching one little character can mess up everything.
The device-ids, such as /dev/sda for the first and /dev/sdb for the second disk asf. might change after reboot . These device-ids are not permanent assigned to a special drive within Linux.
It is rather a first come, first served coincidental assignment of IDs by the OS.
Therefore you need to be very careful here. Before you start you should be absoltue sure what device-id belongs to which drive.
I am working with UUID and labels which are unique identfiers for every single harddisk partition. (sudo blkid /dev/sdaX)
The labels you have to assign manually. (e.g. sudo tune2fs -L YOURLABEL /dev/sdXi) You can even mount your HDD by calling the label e.g. mount LABEL=YOURLABEL /media/MUSIC
Be also very careful, when running automated backups with cron using the /dev/sdX terminology. You might come back and find your PC wiped out.


II.1 CLONING AN ENTIRE DRIVE WITH DD - The (almost) 100% waterproof method.

Below terminology: X=active-medium Y=backup-medium

To make 100% sure that you can restore a disk exactly as it was before a crash or any other mess-up, the program "dd" is your choice.
You just swap the disks as the restore method of choice and you're done.
You'll find dd on any Unix system. It is a base system command. With dd you'll copy every single block of a drive to another drive.
You do this with the drives unmounted.
Your target disk must be at least as big as the source disk. With this method you can of course easily clone disks.

NOTE: In some cases, if there are hw-defects on the disk dd can get problems to read the affected blocks properly when trying to back them up. It might hangs up.
In this case dd_rescue can be used. (sudo apt-get install dd_resue) dd_rescue will continue reading the readable blocks to store at least what's still readable.

Backup:

Step 1: Boot Knoppix with your drives connected, unmount automatically mounted drives and open a terminal
Step 2: Identify your source and target drive with "sudo fdisk -l"
Step 3: Run the backup: "sudo dd if=/dev/sdX bs=1k conv=sync,noerror of=/dev/sdY "
Step 4. Reboot

That's it. Just a single command.


Restore:

Method 1:

Swap your disks physically

Method 2:

Step 1: Boot Knoppix with your drives connected, unmount automatically mounted drives and open a terminal
Step 2: Identify your source and target drive with "sudo fdisk -l"
Step 3: Run the restore: "sudo dd if=/dev/sdY of=sdX bs=512"


Disadvantages:
1. You have to bring your system down to be able to boot up Knoopix.
2. Incremental backups won't be possible

Advantages:
You can be 100% sure that you get your system back without configuration work, such us formatting, partitioning,
setting up the Master Boot Record in shortest time.


II.2. BACKING UP YOUR FILES WITH RSYNC

Below terminology: X=active-medium I=partition number on active-medium Y=backup-medium Y=partition number on backup-medium

The rsync utility is a great reliable piece of software. IMO the best you can find for running on-the-fly backups.
rsync works on Linux filesystems only. It won't work on ntfs drives (above dd method is covering everything)
You can run incremental backups easily that'll will cutdown backup space and time requirements substantially.
I think it is the perfect utility to handle your datadisks.

NOTE: The below described handling won't work on crashed hardisks. It'll work if data got lost, overwritten or if any king of upgrade failed.
To be able to restore a system with rsync some more steps are needed to restore a crashed drive. You need to additionally save the master boot record
and the partition table beforehand. All this is much more complex than using the dd-backup method. Perhaps I'll describe this at a later stage.

Prerequistes:

Setup a logical directory structure on your backup-disk to be able to get some kind of control, when handling multiple disks and partitions.
It could look like e.g. /hostname/source-disk/partition-label/backup-DATE-X

NOTE: Make sure you understand all options that are shown in below rsync commands by looking up "man rsync"


Backup (INITIAL):

Step 1: Connect your backup drive
Step 2: Identify your source and target drive with "sudo fdisk -l"
Step 3: Run the backup: "rsync -Cavuhx --delete --numeric-ids $EXCLUDES /dev/sdXI /dev/sdYJ/<LABEL>/<partition>/master"

Note1: I have choosen "master" as a directory-name for the first master backup. The later on described incremental backups need to refer to this reference backup
Note2: The variable $EXCLUDES need to outline the directories , which are not supposed to be backed-up e.g.
EXCLUDES="--exclude="*gvfs" --exclude="*.log"--exclude=/tmp --exclude=/lost+found --exclude=/proc \
--exclude=/dev --exclude=/sys --exclude=/var/tmp"

Note: The less activities you got ongoing on the system whilr running rsync the more reliable the backup gets.

That'll be it. Just as single command.


Restore:

Pretty easy. You switch target and source, leave out the --delete and EXCLUDE options. You can do this on-the-fly. Close as many applications as possible.

Step 1: Connect your backup drive
Step 2: Identify your source and target drive with "sudo fdisk -l"
Step 3: Run the backup: "rsync -Cavuhx --numeric-ids /dev/sdYJ/<LABEL>/<partition>/master /dev/sdXI"
Step 4: Reboot

That'll be it. Just as single command.


Incremental backups:

It will be almost exactly the same method as the inital rsync based backup. We just introduce the option "--link-dest" which will be the reference path to the backup-master.


Step 1: Connect your backup drive
Step 2: Identify your source and target drive with "sudo fdisk -l"
Step 3: Run the backup: "rsync -Cavuhx --delete --numeric-ids --link-dest=/dev/sdYJ/<LABEL>/<partition>/master $EXCLUDES /dev/sdXI /dev/sdYJ/<LABEL>/<partition>/increment-$(date '+%m%d%y-%H.%M')"

Note1: The source data will be compared with the "master" directory on the target drive and only the changed files will be stored in the new directory called "increment-DATE"
Note2: The variable $EXCLUDES need to be replaced with the directories , which are not supposed to be backed-up e.g.
EXCLUDES="--exclude="*gvfs" --exclude="*.log"--exclude=/tmp --exclude=/lost+found --exclude=/proc \
--exclude=/dev --exclude=/sys --exclude=/var/tmp"

That'll be it. Just as single command.


Incremental restore:

Pretty easy. You switch target and source, leave out the --delete and EXCLUDE options. You can do this on-the-fly. Close as many applications as possible.

Step 1: Connect your backup drive
Step 2: Identify your source and target drive with "sudo fdisk -l"
Step 3: Run the backup: "rsync -Cavuhx --numeric-ids /dev/sdYJ/<LABEL>/<partition>/increment-$(date '+%m%d%y-%H.%M') /dev/sdXI"
Step 4: Reboot

That'll be it. Just as single command.


Managing your rsync backups:

Once in while you want to manage your backups. You might want to free up space or get rid of old stuff. This is quite easy if you understand the logic behind
they way rsync handles the incremental backups via hardlinks.

Method I:

You just remove anything (master and incremental-backups) on your backup disk. And then you run an initial master backup again

sudo rm -rf /dev/sdYJ/<LABEL>/<partition>/*

Note: If you have done a major upgrade it IMO makes no sense to keep the old master and increments if your new system is running stable
for a while. Just drop them.


Method II:

If you want to keep one particular back up. You can remove all directories except the one you want to keep. This one you move to master.

Let assume you have one master and two incremental backups. What you do is:

sudo rm -rf /dev/sdYJ/<LABEL>/<partition>/master
sudo rm -rf /dev/sdYJ/<LABEL>/<partition>/increment-DATE1

sudo mv /dev/sdYJ/<LABEL>/<partition>/increment-DATE2 /dev/sdYJ/<LABEL>/<partition>/master

Now you can run your next incremental backup.

The rsync logic: All incremental backups introduce hardlinks to the original files (inodes) on the "master"-backup to avoid saving them more then once.
If you delete "master" or "incremental-DATEX" you won't delete all files underneath, you just delete the files that are unique to "master" or "incremental-DATEX".