So I was thinking of silly things I’ve done that pseudo-broke my system, or made me think I had a broken system. Like the time I put the cmd :

exit

in my ~/.bash_aliases file and I had to open a text editor to fix it because that broke all the terminals on my machine.

I’m curious what other silly things users have done to confuse themselves.

  • limelight79@lemm.ee
    link
    fedilink
    arrow-up
    4
    ·
    1 month ago

    I may have posted this before, but…late last year I realized my Debian server with circa 2009 hardware, with 4 gb of RAM and Core 2 Quad processor, was no longer up to the tasks I wanted it to perform; in particular, running a Home Assistant server. Back in 2018 or so, I added a software Linux RAID5 array with 5 active 3 TB drives and one hot spare, along with a “cold spare” that I’ve never actually used.

    So, early this year, I bought hardware to upgrade my desktop machine, which was still plenty fast for me, and move the guts to my server. This is how my server usually gets upgraded. Upgrade the desktop machine, give it a few days or weeks to make sure it’s stable, and then upgrade the server.

    I installed the hardware without a problem, booted it up, and everything seemed okay, except that I …couldn’t access the RAID. At first it was like, well, I’m sure it’s nothing serious, but then when mdadm could even FIND it, I started to get extremely worried. Fear set in.

    Long story short: When I built the RAID, I followed directions that used the entire discs as the RAID, instead of making a partition on the disc and using that partition. The old motherboard didn’t care, but the new one saw the bare discs and was like, “Hey, those are messed up, I’ll fix the partition table for you!” Turns out, building Linux RAIDs by using the full discs like that is a VERY BAD IDEA for exactly this reason - but there are still guides out there showing that method and not mentioning the risks.

    I was panicking. I spent days trying to figure out what to do and nothing was working. I was asking for help on the Linux-RAID list (and most of them were as helpful as they could be). Unfortunately my backups were NOT up to par (something I should have checked before starting), and I was at the point where I was like, well, I’ve lost x, y, and z.

    I had basically given up and was just recreating the RAID using the “create command” then trying to see if I could mount the drive read-only. With 6 drives, there are quite a few possible combinations that could be the right one. If I remember correctly, I was able to figure out which drive was the spare, so I could limit my searches to the other 5, and knowing all 5 were in use, it was a matter of trying different orders. I think I got close one time and ext4 gave me weird read error, so after that I swapped two drives, and hit the right order.

    Eventually … I found it. I found the right combination and could reload it! Everything was there, untouched! As quickly as I could, I copied everything to a 10 TB drive I bought and installed into the desktop system. I saved the command, rebooted, and the same thing happened again - so it was definitely a motherboard problem - but this time I knew how to recreate it, and did so.

    Since I now had a backup, I partitioned each drive and rebuilt the array using partitions…and I saved every piece of data I could think of about building the array, outputs of mdadm, outputs of /proc/mdstat, partition IDs, etc. Naturally, having that info likely means I’ll never need it.

    I was so relieved when I saw that mount command work without error. I spent close to a week worrying about it, and in that moment it was a huge rush.

    New setup handles HA and other duties with aplomb and is very reliable, so in the end it was very worth it.

    This is less “silly” and more “horrifying”. Sorry.

    • stembolts@programming.devOP
      link
      fedilink
      arrow-up
      1
      ·
      edit-2
      1 month ago

      Well written, and I learned a few things from this story. I recently started a cloud of my own with 4 20TiB HDDs in a raid 5 configuration so this story felt very prescient to me. Makes me very grateful for the simplicity of Cockpit and LUKS2… my setup felt so trivial to configure!