Murphy is my hero
I’m a big believer in Murphy’s Law… especially when it comes to computers. If you work with computers long enough, you learn to expect the unexpected.
Case in point:
My company has a server. It’s got a crappy Adaptec RAID card which is smart enough to tell me that one of the four drives has a problem, but not WHICH drive. The only solution was to get four new drives, replace all the existing drives and get rid of the crappy Adaptec card.
Problems:
- RAID controller doesn’t seem to like a mix of the old SATA1 and the new SATA2 drives. It picks a random number of drives (0-4) to marked as failed during boot. About 5-10 reboots are required before it decides all the drives are good. Since the system takes forever to initialize it’s BIOS and the BIOS of the card, that’s about 7-15min.
- RAID controller won’t allow the OS to see the individual drives unless it’s a part of a RAID array. Reboot 10 more times after marking the four drives as “raw volumes”.
- During the coping of files from the old RAID array to the new software RAID array, a drive fails in the original array which slows down the copy.
- Grub segfaults when trying to write the MBR to the new drives.
- My CD copy of Recovery is Possible is apparently too scrached up to be useful. Must re-download/burn an ISO.
- This whole process was supposed to take about 7hrs max. Heh.
- A 2nd drive has failed from the original array. It is now completely failed and all data on it unrecoverable. I no longer have a fail-safe plan.
- The Marvell SATA chipset on the motherboard isn’t supported by my kernel. I have to keep using the crappy Adaptec SATA RAID card (w/o the raid).
- Oh, looks like I’m going to miss a party since this is taking so long.
- Yep missing the party. Also apparently CentOS 4.1’s initrd doesn’t support root on LVM2. This is bad since my root partition is on RAID5/LVM2.
- Of course it helps if you install the GRUB boot loader in the MBR not a partition. Doh.
Final (temporary) solution:
- Put one of the old drives back in the system. Yes, this causes disk detection errors at boot, but I need another disk.
- Make this new disk the boot/root filesystem disk.
- This allows me to boot off the harddrives (yea!) and get a root filesystem running in order to start LVM2 and mount /home and /var
- At some point in time, the software raid got confused and marked one partition offline for each of a raid1 and raid5 array. Now I get to resync them (should only take about 500minutes according to /proc/mdstat).
Total time: 10am-5pm, 7-10pm on Sat. 9am-6pm on Sun.