Monday, December 26, 2022

beefing up my Pop-OS's resiliency

After a mishap with some Steam drivers that caused me to reinstall Pop-OS, I decided to make my box a bit more resilient. I noticed that my Pop-OS install didn't have a recovery drive. This is a very nice feature if you don't want to go hunting for a USB if your root filesystem ever got corrupted.

Normally you get the Recovery partition out of the box with Pop-OS. But I had done a custom install. You don't get a Recovery partition if you don't specifically add it. So I had to jump through a few hoops listed in this article here. But that article needed a decent example of a two things:

  • running the pop-os upgrade tool 
  • mapping the correct UUIDs in the recovery.conf and boot loader entry 

Here are the mappings, shown here with colors to help identify what goes where: 

 

 

 

 

 

 

 

 

 

 

 

Here is the upgrade command:

pop-upgrade recovery upgrade from-release 22.04
 
Took about a minute to refresh the /recovery filesystem.  Here is the output from the journal. You can see this interactively if you open a terminal with the command "journalctl -flu pop-upgrade":
 
sodo@pop-os:~$ journalctl -b0 | grep pop-upgrade
Dec 25 12:03:52 pop-os com.system76.PopUpgrade.Notify.desktop[3621]: checking if pop-upgrade requires an update
Dec 25 12:03:52 pop-os pop-upgrade[3762]: [INFO ] daemon/mod.rs:389: initializing daemon
Dec 25 12:03:52 pop-os pop-upgrade[3762]: [INFO ] daemon/mod.rs:749: daemon registered -- listening for new events
Dec 25 12:03:52 pop-os pop-upgrade[3794]: pop-upgrade was already not on hold.
Dec 25 12:03:52 pop-os pop-upgrade[3762]: [INFO ] daemon/mod.rs:1099: updating apt sources
Dec 25 12:03:52 pop-os pop-upgrade[3856]: Hit:1 https://dl.google.com/linux/chrome/deb stable InRelease
Dec 25 12:03:52 pop-os pop-upgrade[3856]: Hit:2 http://apt.pop-os.org/proprietary jammy InRelease
Dec 25 12:03:52 pop-os pop-upgrade[3856]: Hit:3 http://apt.pop-os.org/release jammy InRelease
Dec 25 12:03:52 pop-os pop-upgrade[3856]: Hit:4 http://apt.pop-os.org/ubuntu jammy InRelease
Dec 25 12:03:52 pop-os pop-upgrade[3856]: Hit:5 http://apt.pop-os.org/ubuntu jammy-security InRelease
Dec 25 12:03:52 pop-os pop-upgrade[3856]: Hit:6 http://apt.pop-os.org/ubuntu jammy-updates InRelease
Dec 25 12:03:52 pop-os pop-upgrade[3856]: Hit:7 http://apt.pop-os.org/ubuntu jammy-backports InRelease
Dec 25 12:03:53 pop-os pop-upgrade[3856]: Reading package lists...
Dec 25 12:03:53 pop-os pop-upgrade[3762]: [INFO ] daemon/mod.rs:1010: performing a release check
Dec 25 12:03:53 pop-os pop-upgrade[3762]: [INFO ] daemon/mod.rs:1017: Release { current: "22.04", lts: "true",  next: "22.10", available: false }
Dec 25 12:03:53 pop-os pop-upgrade[3762]: [INFO ] release_api.rs:58: checking for build 22.04 in channel nvidia
Dec 25 12:03:54 pop-os systemd[1]: pop-upgrade.service: Deactivated successfully.
Dec 25 12:03:54 pop-os systemd[1]: pop-upgrade.service: Consumed 1.066s CPU time.
Dec 25 12:04:03 pop-os systemd[1225]: pop-upgrade-notify.service: Control process exited, code=killed, status=15/TERM
Dec 25 12:04:03 pop-os systemd[1225]: pop-upgrade-notify.service: Failed with result 'signal'.
Dec 25 12:04:48 pop-os pop-upgrade[5183]: checking if pop-upgrade requires an update
Dec 25 12:04:48 pop-os pop-upgrade[5200]: [INFO ] daemon/mod.rs:389: initializing daemon
Dec 25 12:04:48 pop-os pop-upgrade[5200]: [INFO ] daemon/mod.rs:749: daemon registered -- listening for new events
Dec 25 12:04:48 pop-os pop-upgrade[5218]: pop-upgrade was already not on hold.
Dec 25 12:04:48 pop-os pop-upgrade[5200]: [INFO ] daemon/mod.rs:1099: updating apt sources
Dec 25 12:04:49 pop-os pop-upgrade[5221]: Hit:1 https://dl.google.com/linux/chrome/deb stable InRelease
Dec 25 12:04:49 pop-os pop-upgrade[5221]: Hit:2 http://apt.pop-os.org/proprietary jammy InRelease
Dec 25 12:04:49 pop-os pop-upgrade[5221]: Hit:3 http://apt.pop-os.org/release jammy InRelease
Dec 25 12:04:49 pop-os pop-upgrade[5221]: Hit:4 http://apt.pop-os.org/ubuntu jammy InRelease
Dec 25 12:04:49 pop-os pop-upgrade[5221]: Hit:5 http://apt.pop-os.org/ubuntu jammy-security InRelease
Dec 25 12:04:49 pop-os pop-upgrade[5221]: Hit:6 http://apt.pop-os.org/ubuntu jammy-updates InRelease
Dec 25 12:04:49 pop-os pop-upgrade[5221]: Hit:7 http://apt.pop-os.org/ubuntu jammy-backports InRelease
Dec 25 12:04:50 pop-os pop-upgrade[5221]: Reading package lists...
Dec 25 12:04:50 pop-os pop-upgrade[5200]: [INFO ] daemon/mod.rs:1010: performing a release check
Dec 25 12:04:50 pop-os pop-upgrade[5200]: [INFO ] daemon/mod.rs:1017: Release { current: "22.04", lts: "true",  next: "22.10", available: false }
Dec 25 12:04:50 pop-os pop-upgrade[5200]: [INFO ] release_api.rs:58: checking for build 22.04 in channel nvidia
Dec 25 12:04:50 pop-os systemd[1]: pop-upgrade.service: Deactivated successfully.
Dec 25 12:04:50 pop-os systemd[1]: pop-upgrade.service: Consumed 1.000s CPU time.
 
Just wanted to highlight those two things.  Best of luck if you try creating the Recovery partition.  It was a bit of a challenge!
 
Just remember: DON'T PUT YOUR RECOVERY PARTITION ON A FILESYSTEM ALONGSIDE ANY OF THE OTHERS YOU EXPECT TO WIPE CLEAN!!  What happened to me was that I initially put the recovery partition on a RAID 0 set, /dev/md0 as the third partition (/dev/md0p3).  The problem was, when I went to recover, because that third partition was along side my /root partition (/dev/md0p2), the installer couldn't lock the entire filesystem, as /dev/md0p3 was in use.  So I couldn't see the /dev/md0 raid set at all in the installer.  I could only see the underlying physical partitions (not the mdadm RAID partitions).   This happened to be installed on NVMe drives, so all I saw was "/dev/nvme2n1p2" and NOT /dev/md0.  So beware!  To resolve, I just put the recovery drive on a second hard drive that I use for my backup.  This drive is not / (root), /boot, /boot/efi or swap.  It's just a Seagate Ironwolf disk that I occasionally mount to do backups (/dev/sda).
 
Oh and I also made a RAID5 set on a second PC and added an rsync to the cron for that guy.  This is in case my main box gets blown up for whatever reason:
rsync -avxHAXP --info=progress2 --exclude='Downloads' --delete /mnt/sodofiles/* sodo@secondbox:/sodoFiles
 
Yay for resiliency! 

Next stop will be creating some Ansible scripts to configure a newly spun up box with all the core softwares that run my editing rig.

'sodo
 
References

 
R

Saturday, January 08, 2022

Asus Z590-E gaming motherboard, i7-11700K 11th gen (LGA1200) build

After a series of issues, I decided to build a new box.  This is my first build since 2013, so a lot of technology has changed.  Since the issues revolved around spinning disks, I thought I'd remove that problem entirely by buying only SSDs.  This turned out to be a fun build.


 

CATEGORY WHAT URL UNIT COST UNITS TOTAL COST
CPU Intel Core i7-11700K Desktop Processor 8 Cores up to 5.0 GHz Unlocked LGA1200 https://www.amazon.com/Intel-i7-11700K-Desktop-Processor-Unlocked/dp/B08X6ND3WP?th=1 $354.00 1 $354.00
MOTHERBOARD Asus Z590-E Gaming WiFi 6E LGA 1200(Intel 11th/10th Gen) ATX Gaming Motherboard https://www.amazon.com/ROG-Gaming-Motherboard-Stages-Thunderbolt/dp/B08T6HTXF9 $359.99 1 $359.99
MEMORY Corsair Vengeance RGB Pro 32GB (2x16GB) DDR4 3600 (PC4-28800) C18 https://www.amazon.com/Corsair-Vengeance-2x16GB-PC4-28800-Optimized/dp/B082DGZJ9C/ref=sr_1_2?crid=SYEGHUBHE26B $152.99 1 $152.99
DISK1 PNY XLR8 CS3040 4TB M.2 NVMe Gen4 x4 Internal Solid State Drive (SSD) https://www.amazon.com/PNY-CS3040-Internal-Solid-State-Drive-M280CS3040-4TB-RB/dp/B08TRN15K5/ref=sr_1_1?crid=58NNUWQ2KRY3&th=1 $549.99 2 $1,099.98
DISK2 SAMSUNG 970 EVO Plus SSD 500GB - M.2 NVMe Interface Internal Solid State Drive https://www.amazon.com/Samsung-970-EVO-Plus-MZ-V7S500B/dp/B07M7Q21N7/ref=sr_1_1?crid=1Q9C4DFCS8DEL&th=1 $69.99 2 $139.98
VIDEO CARD NVIDIA - GeForce GTX 970 4GB GDDR5 PCI Express 3.0 Graphics Card https://www.amazon.com/NVIDIA-GeForce-GDDR5-Express-Graphics/dp/B00UNF2XRK/ref=sr_1_2?crid=1JSKD3XD3JBZR $364.92 1 $364.92
FAN Corsair iCUE H100i Elite Capellix Liquid CPU Cooler https://www.amazon.com/Corsair-H100i-Capellix-Liquid-Cooler/dp/B08G1NSG7F/ref=sr_1_3?crid=3NAH6JA9UOFH5&th=1 $149.99 1 $149.99
POWER SUPPLY Corsair RM Series, RM750, 750 Watt, 80+ Gold Certified, Fully Modular Power Supply https://www.amazon.com/CORSAIR-Certified-Modular-Microsoft-Standby/dp/B07RF237B1/ref=sr_1_3?crid=RPYYLTL70BIN&th=1 $82.21 1 $82.21
CASE NZXT H510i - CA-H510i-W1 - Compact ATX Mid -Tower PC Gaming Case https://www.amazon.com/NZXT-H510i-Vertical-Integrated-Water-Cooling/dp/B07TD9S3HZ/ref=sr_1_2?crid=3TBS1R50XAAH&th=1 $134.99 1 $134.99




TOTAL $2,839.05

OS Specs

  • POP-OS Linux
  • RAID0 of system drive
  • RAID1 for data drive


Things learned along the way:

  • UEFI through and through for future proofing
  • Needed to manually enable M.2-4 slot in BIOS
  • Great article on formatting RAID set for Linux: https://forum.level1techs.com/t/guide-install-pop-os-20-04-on-raid/156379
  • Rsync is your friend: rsync -avxHAX --progress /myFiles/* /backup/
  • NVMe drives can load so fast that mdadm doesn't have time to settle down.  Have to add x-systemd.device-timeout=0 parameter to /etc/fstab
  • RGB Lights are pretty




That's it for now.  Cheers!

TAG 


Saturday, January 01, 2022

RAID problem leading up to new build

It has been a looong time since I posted.  Hello again.  This post begins with a simple desire to replace a member of an Intel RST (Rapid Storage Technology) RAID set..(after upgrading to latest Fedora and encountering Grub issues with load..but that is another story).


You may remember my original build from 2013 here: http://www.techanswerguy.com/2013/04/i7-3930k-lga2011-build-asus-p9x79.html

This is the box that my RAID-10 set is installed in.  As it's 8.5 years later, the 1.5TB Western Digital drives in that RAID are getting a bit long in the tooth.  Failing one was from 2009.  Replacement should be no biggie, but I had to buy a new one (2.0TB were the easiest to get) and refresh my knowledge of RAID.  And Intel RST, the firmware (IE, not hardware) RAID.  So before touching the box, I'd need to backup what I had just in case the new disk gave me problems or be in a real world of hurt with a dead RAID set.

Digging into dmesg, I saw a bunch of errors that I never got to resolving (EDAC sbridge: Failed to register device with error -19).  The real problem I needed to address were the smartctl errors I saw: unrecovered read error - auto reallocate failed - Google Search.   


I'd recommended everyone have smartctltools installed.  Very helpful, basic diagnostic utility:


 

As I was thinking about the problem, I figured to solve this issue going forward and let me just get a ton of storage.  So I searched for an bought 3 6TB Seagate IronWolf Pro recertified SATAs from ServerPartsDeals.  Good shop, got them for $124 each.

Now with future state in hand, I went back to work on the problem.  I thought it might be faster to copy to the backup directly over SATA instead of USB.  This was my downfall, as the Asus BIOS started flaking out.  The issue was that when I chose to do a simple thing, connect a new SATA disk, the BIOS would hang and not give me the ability to log into the BIOS.  After fussing with various settings for two days:



I thought best idea might be to update the BIOS.  Never got back to that.  But I had errors indicating the CMOS battery was dying.  So I went to store to get one.  That didn't fix my issue.

Getting completely frustrated, I abandoned the idea of copying the data off the RAID to an internal drive, reset the configuration back to the way it was and rebooted the box.  The BIOS still would hang occasionally..that was weird.  But I had to get the data off the RAID, so I hooked up my USB drive bucket with an older (circa 2012) 3TB backup drive, mounted the disk.  Using rsync, I was able to get the data off the RAID drive to the external:

 rsync -avxHAX --info=progress2 --exclude 'Downloads/'  /home/sodo/* /mnt/extbackup/

Phew.  That was close.  Now, with the backup created, I was able to safely undertake the procedure to replace the drive and rebuild the raid.  This was the easiest part of the whole shebang:
- poweroff and unplug
- remove drive
- install new 2.0TB

Once this was done, I started the box up and watched the RAID rebuild:


But the nagging BIOS issues on the box still irked me.  During troubleshooting, I manhandled the video card and pulled it out while the PCIe card was locked.  I pulled some of the pins from the mobo.  Bad, bad, bad.  So I decided that the box may have had it's day and I researched my options.  Time to upgrade!!  I was off to build a new box based upon Intel's 11th gen chip, the i7-11700K.  Affordable and not too crazy.  And I chose a new mobo, too.  The Asus Z590-E. This should be fun!

Until next time..happy new year and happy computing!

TAG

Feel free to drop me a line or ask me a question.