Saturday, January 01, 2022

RAID problem leading up to new build

It has been a looong time since I posted.  Hello again.  This post begins with a simple desire to replace a member of an Intel RST (Rapid Storage Technology) RAID set..(after upgrading to latest Fedora and encountering Grub issues with load..but that is another story).

You may remember my original build from 2013 here:

This is the box that my RAID-10 set is installed in.  As it's 8.5 years later, the 1.5TB Western Digital drives in that RAID are getting a bit long in the tooth.  Failing one was from 2009.  Replacement should be no biggie, but I had to buy a new one (2.0TB were the easiest to get) and refresh my knowledge of RAID.  And Intel RST, the firmware (IE, not hardware) RAID.  So before touching the box, I'd need to backup what I had just in case the new disk gave me problems or be in a real world of hurt with a dead RAID set.

Digging into dmesg, I saw a bunch of errors that I never got to resolving (EDAC sbridge: Failed to register device with error -19).  The real problem I needed to address were the smartctl errors I saw: unrecovered read error - auto reallocate failed - Google Search.   

I'd recommended everyone have smartctltools installed.  Very helpful, basic diagnostic utility:


As I was thinking about the problem, I figured to solve this issue going forward and let me just get a ton of storage.  So I searched for an bought 3 6TB Seagate IronWolf Pro recertified SATAs from ServerPartsDeals.  Good shop, got them for $124 each.

Now with future state in hand, I went back to work on the problem.  I thought it might be faster to copy to the backup directly over SATA instead of USB.  This was my downfall, as the Asus BIOS started flaking out.  The issue was that when I chose to do a simple thing, connect a new SATA disk, the BIOS would hang and not give me the ability to log into the BIOS.  After fussing with various settings for two days:

I thought best idea might be to update the BIOS.  Never got back to that.  But I had errors indicating the CMOS battery was dying.  So I went to store to get one.  That didn't fix my issue.

Getting completely frustrated, I abandoned the idea of copying the data off the RAID to an internal drive, reset the configuration back to the way it was and rebooted the box.  The BIOS still would hang occasionally..that was weird.  But I had to get the data off the RAID, so I hooked up my USB drive bucket with an older (circa 2012) 3TB backup drive, mounted the disk.  Using rsync, I was able to get the data off the RAID drive to the external:

 rsync -avxHAX --info=progress2 --exclude 'Downloads/'  /home/sodo/* /mnt/extbackup/

Phew.  That was close.  Now, with the backup created, I was able to safely undertake the procedure to replace the drive and rebuild the raid.  This was the easiest part of the whole shebang:
- poweroff and unplug
- remove drive
- install new 2.0TB

Once this was done, I started the box up and watched the RAID rebuild:

But the nagging BIOS issues on the box still irked me.  During troubleshooting, I manhandled the video card and pulled it out while the PCIe card was locked.  I pulled some of the pins from the mobo.  Bad, bad, bad.  So I decided that the box may have had it's day and I researched my options.  Time to upgrade!!  I was off to build a new box based upon Intel's 11th gen chip, the i7-11700K.  Affordable and not too crazy.  And I chose a new mobo, too.  The Asus Z590-E. This should be fun!

Until next time..happy new year and happy computing!


No comments:

Feel free to drop me a line or ask me a question.