Sunday, March 17, 2013

backing up my systems: it ain't my day (or month)

OK.  I've been in three weeks of hardware hell, mainly due to the fact that I wanted to get my backups for all my machines (a MacBook Pro, my main Linux video editing workstation and an older Windows Vista digital audio workstation (DAW)) properly backed up.  I detailed my strategy for this in my last post.  This post is more of a rant than anything else, so please excuse the lack of any real mentorship on problem solving, except maybe "Google is Your Friend."

Issue #1: Drobo runs out of space
The Drobo has been a fine unit for me.  But as time goes on, you acquire more media and your available space runs out.  You'd think it would be a simple matter of buying a new disk, putting it in the Drobo and letting the BeyondRAID rebuild it's array.  Well, the first drive I bought, a Western Digital Green 1TB, died after the first rebuild.  That never happened to me before, where a drive failed out-of-the-box for me.  Never having that problem before, I didn't truly believe it was dead.

With my non-belief firmly in place, I tried to use the drive in different capacities.  So as a test, I formatted the disk using my Thermaltake BlacX connected to my Mac.  I was able to copy files over to it (though I didn't copy gigs and gigs worth as a true test).  But when I put the unit back in the Drobo, the Drobo gave an immediate "red" light for that drive bay, indicating the drive was bad.  I switched drives in the Drobo unit around, because I thought it could have been a faulty drive bay.  

Then, I had the bright idea to move the data off my 2TB system drive of my main Linux machine to the new Western Digital, put the 2TB in the Drobo and use the new 1TB (which I really thought was a good, error-free drive) as my Linux system drive. So still thinking that the 1TB drive was good, I would have to do some fancy footwork in order to make this possible as the system drive was a logical volume.  This entailed a week of work to figure out how to shrink a logical volume in order to fit the used space of the 2TB drive (which was less than a terabyte) onto the 1TB.

I learned a lot from that experience, to be detailed in a later post.  Suffice it to say that in the end, the 1TB was truly dead and I ended up getting a new 1TB (a Western Digital Black) from BestBuy and that solved my Drobo storage issue.  Kudos to BestBuy, as they were able to give me the Black at the same price as the Green for my trouble.

Issue #2: Mac Time Machine "the identity of this backup disk has changed" (Sparsebundle Problem)
This was an odd one.  After installing the new disk in the Drobo, Time Machine started showing the error "the identity of this backup disk has changed".  From the below post:

I executed the "chflags" command listed.  This ran for about four hours.  After, I tried to execute the "hdutil" command listed, but the Mac said it had already ran the command.  So testing the result of the chflags command, I shutdown and restarted the Drobo.  When Time Machine started backing up, it no longer gave me the error.  Hooray.  Another one down.

Issue #3: Windows Vista DAW crashes
So after a week spent on #1 and #2, I was ready to start work on a new musical project with some friends.  Firing up my old Dell 400SC running Windows Vista (OK, OK..I know I need to upgrade Win7, but I've got a recording session coming up soon and didn't want to change OS's yet), I was presented with this error:
c\windows\system32\config\system corrupt

Oh, wonderful.  So I popped in the Vista Ultimate DVD and selected "Repair".  After it ran, the system rebooted and I was pleasantly surprised to find that this fixed the problem and that I was able to get back into the system.

Getting back into the system, I reasoned that if the drive was going bad, I'd better make a backup.  So I ponied up $40 for Drobo's PC Backup product, the ugly step-brother of the seemless Drobo integration with Mac Time Machine.  Assuming the PC product worked the same way the Mac product did, I selected the defaults.  Well, the defaults do NOT backup the entire drive.  Only your user data.  My bad for not reading the fine print, but I believe that a Drobo product should be consistent between systems and the default should be to backup your entire drive with all system data included, as long as you have the space on your Drobo.  But that's just me.

The missing data would be crucial for what happened next.

Issue #4: Windows Vista DAW crashes again
After taking a two day hiatus from my backup shenanigans, I fired up the DAW again.  And guess what..a new error appears:
\Windows\system32\winload.exe is missing or corrupt (status 0xc000000f)

Oh great.  Going back to my ritual, I loaded in the Vista Ultimate DVD and selected "Repair".  However, after the reboot, no go..still the same "missing or corrupt" error.  I tried a number of times doing the repair, as the Vista repair process would show slightly different screens every time it booted and recognized the system.  This gave me false hope that the DVD was actually repairing something correctly.  Also, the frustrating part of this process that for whatever reason, the DVD would take 10 minutes to load on my Dell.  I'm not sure what the problem was there.  So I chewed up a few hours doing this multiple times.  

Finally, after reading some Google posts by people with the same issue, I decided to run "chkdsk /r" from the command line, rather than relying on the non-informative Windows Vista screen to run some unknown fix command.  I had to specifically boot into the System Recovery Options screen as shown in the below post:

Once I was there, I selected "Command Prompt" and typed in good ol' "chkdsk /r", the "repair" option to chkdsk.  This time, I was rewarded with an actual status screen that told me "bad clusters found", Windows was marking the clusters as bad and was moving the files located on those clusters to good sectors on the disk.  (Sectors and cluster primer here: http://t.co/DLFjrXAp5C).  This process took about three hours, unlike the half-hearted effort that Windows Vista attempted.  I wonder why Vista did not default to doing a real "chkdsk /r".  That doesn't help anyone who has a failing disk.  Bad default!

After the bad cluster identification and repair, I was really glad to see Vista boot up properly!  But since there were so many bad clusters, I had to make a full backup or clone of that drive but quick!  For this, I popped in an unused 500GB SATA I had lying around.  I repartitioned and formatted this drive.  It had been a second Vista system disk and one point, so I knew the drive's main partition was marked as bootable.  So I was good to go there.  I then dragged all the files from my C: onto the new E: (my DVD being the D:).  However, on bootup, Vista showed an error:
"System volume on disk is corrupt"

I suspected this was a problem with the NTFS boot files on the 500GB drive as they had links from the partition map from the old 256GB drive that was failing.  Luckily, when I ran Vista repair, Vista was able to fix this issue and the system started properly.

Issue #5: Windows Vista continually keeps "preparing your desktop"
After the system came up, I made sure all my applications (Reaper, Drobo PC Backup, etc) were working properly.  Unfortunately, they were not, as Vista continually kept giving me the message "Preparing Your Desktop" when I logged into my profile.  I tried a number of things from Google, but those suggestions did not work.  I didn't have any critical data in the old profile, so I figured I'd bit the bullet and create a new profile.  After doing this, the message disappeared and I was able to save my desktop settings and application preferences properly.

In Sum
Wow.  So this has been three weeks of hell.  I "think" I am back to steady state with my systems.  I was able to reset Drobo PC Backup to a full system backup of my Vista DAW to the Drobo.  The Drobo is backing up the Mac just fine and CrashPlan is encrypting my main Linux box backup to the Cloud.

Maybe now I can go outside and get some sun?
TAG

Saturday, March 09, 2013

protecting your data, locally and globally

I've spent a number of years trying various backup methods for my Linux box.  I think I finally have a pretty good one down.  The main idea is to setup my system in order to make backup and restore easier. This setup involves two components:
- a data source: my documents, videos, audio files and pictures are stored locally on a redundant  hardware RAID5 set
- the separation of system and data partitions on different physical devices (hard drives)

Backup Strategy
My actual backup strategy comes in three parts:
- an archive solution: fsarchiver to backup the system and data partitions
- network backup: a network backup device such as a NAS or in my case, a Drobo
- global backup storage solution: unlimited CrashPlan account

This backup strategy has been working well, though not without hiccups along the way.  It protects my important data by providing redundancy at multiple levels.  At a high level, here is how this is implemented:
- disk redundancy via RAID5 set
- local redundancy via network addressable storage
- global redundancy via CrashPlan if my house is destroyed

More specifically:
- my Linux system drive, Fedora 17, is one physical SATA drive
- my data drive is a hardware RAID5 unit using a 3Ware 9650SE with four physical SATA drives
- when I install a new OS, I use symbolic links (screen cap) in my user's home directory to point to my data, explained below

The bottom line is that no single backup method should be your entire backup strategy.  If you only have two of these methods implemented, you're better off than most people.

System Setup
Symbolic links are the key to segment the system partitions from your data partitions.  Segmenting is important because it separates your system from your data.  With this separation, it is much easier (from a Linux perspective), to upgrade and try different versions of Linux on your system drive, while your data drive stays essentially untouched and less prone to upgrade or experimentation tragedies.  Welcome to Linux!

Technically, here is how segmentation is implemented.  On my system drive, I mount my data partition.  In this example, I'm using /mnt as the mount point for my ext4 data partition:
[sodo@computer ~]$ cat /etc/fstab
/dev/mapper/vg_computer-lv_root /                    ext4    defaults        1 1
/dev/mapper/vg_computer-lv_home /home                ext4    defaults        1 2
/dev/mapper/vg_computer-lv_swap swap                 swap    defaults        0 0
/dev/mapper/vg_ogre-lv_root /mnt                     ext4    defaults        0 0

I have my content folders on the data partition that I will symbolically link from my system drive:
[sodo@computer mnt]$ ls -ltr /mnt
total 36
drwxr-xr-x. 16 sodo sodo 4096 Dec 21 19:04 MusicLibrary
drwxrwxr-x.  9 sodo sodo 4096 Jan 12 12:07 videos
drwxr-xr-x.  8 sodo sodo 4096 Feb  1 10:57 doc
drwxrwxr-x.  8 sodo sodo 4096 Mar  2 09:26 pictures

I then create symbolic links from my user's home directory to the equivalent directories that I've setup on my data partition:
[sodo@computer ~]$ ls -l | grep '^l'
lrwxrwxrwx.  1 sodo sodo    8 Oct 23  2011 Documents -> /mnt/doc
lrwxrwxrwx.  1 sodo sodo   18 May 30  2011 Music -> /mnt/MusicLibrary/
lrwxrwxrwx.  1 sodo sodo   14 Sep  2  2011 Pictures -> /mnt/pictures/
lrwxrwxrwx.  1 sodo sodo   12 May 28  2011 videos -> /mnt/videos/

With the symbolic links in place, I've made the link from my system to my data.

The Archive-Backup Process
I am going to use the terms "archive" and "backup" synonymously.  The overview is that I'll show how I back up my data partition using fsarchiver and then I'll copy those backups to both my local network and global storage solutions.  FSarchiver is a bit different than regular backup systems in that it archives entire filesystems only.  It is not a file-based backup method.  So when you go to save or restore a filesystem, you specify a filesystem to backup and have limited ability to exclude (but not include) directories or files with the "exclude" switch only (as of 5/2016).

Below, I show the archive process for the data partition.  Feel free to extrapolate the information herein to do the same for your system partition.

The Core Archive Process
1) check the used space on the source partition to be archived, as well as the available space on the destination/target for the backup (fill in missing info).  From the below example:
a. source filesystem (the data partition): vg_ogre-lv_root
b. destination partition (for backup storage): vg_computer-lv_home
[sodo@computer ~]$ df -H
Filesystem                       Size  Used Avail Use% Mounted on
/dev/sda1                        508M   80M  403M  17% /boot
devtmpfs                         5.3G     0  5.3G   0% /dev
/dev/mapper/vg_computer-lv_root   53G   15G   36G  30% /
/dev/mapper/vg_computer-lv_home  2.0T   .2T  1.8T  67% /home
/dev/mapper/vg_ogre-lv_root      4.5T  1.4T  2.9T  32% /mnt

2) verify available space on the destination filesystem
The source partition is using 1.4TB on vg_ogre-lv_root and I have 1.8TB available on the destination for the backup, lv_home.  So..good to go.

3) if there is enough space on the target filesystem, prepare to run fsarchiver and unmount the data partition.
In order to keep the filesystem from being updated during the archive process, fsarchiver asks to unmount the target filesystem before making the backup.  Like so:
[sodo@computer ~]$ sudo umount /mnt/

The nice thing about the split system-data partition setup is that it is unnecessary to load a Live CD in order to backup the data partition.  Normally, one has to boot with a LiveCD in order to backup the system partition.

4) Once the filesystem is unmounted, run fsarchiver.  As one of the destinations for the backup is the cloud, use the -c option to encrypt with a password:
[sodo@computer ~]$ sudo fsarchiver -j8 -c [password] -o savefs ~/f17backup/backup_lv_root.fsa /dev/mapper/vg_ogre-lv_root

The archive of my 1.2TB drive took five hours on my eight-core, 1.6Ghz Dell SC1430.
[sodo@computer ~]$ ll ~/backup/backup_lv_root.fsa 
-rw-r--r--. 1 root root 1186229328445 Mar  5 08:32 /home/sodo/backup/backup_lv_root.fsa

Copying the file to network-based storage
I have a 2.5TB CIFS (Windows share) created on my Drobo.  On my Linux box:
1) I mount the Drobo filesystem:
[sodo@computer ~]$ sudo mount -t cifs //drobo/Linux /mnt/drobo -o credentials=/home/sodo/smb.credentials

2) Copy the archive to it:
[sodo@computer ~]$ sudo cp -rp ~/f17backup/ /mnt/drobo/

Copying the file over my home network took about 16 hours.

3) Review the archive info (the -c switch allows you to enter a password for the archive):
[sodo@computer ~]$ fsarchiver -c - archinfo /mnt/drobo/f17backup/backup_vg_ogre-lv_root.fsa
Enter password: 
====================== archive information ======================
Archive type: filesystems
Filesystems count: 1
Archive id: 5131fa94
Archive file format: FsArCh_002
Archive created with: 0.6.15
Archive creation date: 2013-03-05_01-20-04
Archive label:
Minimum fsarchiver version: 0.6.4.0
Compression level: 3 (gzip level 6)
Encryption algorithm: blowfish

===================== filesystem information ====================
Filesystem id in archive: 0
Filesystem format: ext4
Filesystem label:
Filesystem uuid: 3ffe7328-8f96-4028-bd79-5f644a030fc2
Original device: /dev/mapper/vg_ogre-lv_root
Original filesystem size: 4.02 TB (4416677830656 bytes)
Space used in filesystem: 1.22 TB (1338969956352 bytes)


A Word About the Drobo
The Drobo has been one of the easiest storage and backup solutions that I've ever used.  It integrates seemlessly with Time Machine for my MacBook and I've created a Windows share on the device in order to copy my Linux archive.

Over the past few years, I've expanded the drives within it about three times now.  I went from four 500GB drives, to four 1TB drives and one 500GB to my configuration as it is today, five 1TB drives.  The drive upgrades were easy, though time consuming.  I removed each of the older drives one at a time and replaced them with the larger drives.  Each time a drive was upgraded, the Drobo would automatically and non-destructively rebuild it's storage protection.  The Drobo's storage protection is called BeyondRAID, Drobo's own custom algorithm on top of RAID.

The integration with Mac is seemless; however, the Windows/CIFS file share can be a bit wonky, as the share has a tendency to become unavailable for whatever reason.  The resolution is to shutdown and restart the Drobo and that seems to fix the problem.

Cloud-Based Storage
A last layer of protection above and beyond the local and network copy of my data is to copy the encrypted archive to a cloud-based solution.  The purpose is to protect my data in case of a natural disaster that destroys all my local storage media.  With the increasing amount of natural and man-made disasters happening these days, I've recently invested in a data protection plan with www.crashplan.com.  I got an unlimited plan to store my 1.2TB of data to the cloud.  Most of that data is audio, video and image files.

After tweaking the CrashPlan app to pump more data through my local and wide area network (from 1280KB and 2560KB, respectively, to about 6400KB for both), it took about two weeks to upload this amount of data to CrashPlan's cloud!

In sum, if you have a lot of data, all these procedures take time.  From backing up, local network copy and then cloud copy.  If you're using Linux, you probably have the stomach for all this.  In the end, though, you'll have a backup solution that is pretty solid and relatively easy to implement unlike custom scripted solutions.

Love to hear any comments on how you backup your systems.

ciao!
TAG

References
http://crazedmuleproductions.blogspot.com/2010/02/fsarchiver-good-backup-for-ext4.html
http://www.drobo.com/how-it-works/beyond-raid.php
http://support.crashplan.com/doku.php/recipe/stop_and_start_engine

Here are some of my earlier articles on fsarchiver and a review of the Drobo.

Feel free to drop me a line or ask me a question.