February 14, 2021
Simplifying the Homelab
Probably unsurprising, I run a small homelab. It's a cliche "love you what you do and you never work a day in your life," kind of thing. In my garage is a 22U rack with a few pieces of network gear (Ubiquiti kit, Unifi), two Dell R720s, a Synolog 2419+, UPS to keep it all powered up, along with a handful of bridges for HomeKit things. It's a fun hobby, except for when it's not.
Like this weekend.
The primary function of my homelab is Plex for me and a small group of friends and family. I've collected a lot of DVDs since I was able to drive myself to Blockbuster Music and Borders (those are two relics), and they're not doing me any good sitting in the DJ caddy they've been in since college. We also had a family media digitization project a few years ago - Dad's gift to the family, and me, so I could stop saying I'd finally rip them all - and Plex does a really nice job of handling home video and audio. I mean, who wouldn't want to be able to pull their bar mitzvah on short notice?
I kid, because it's actually a really nice thing to be able to do, and being digitized, it can live safely forever on said Synology with a twin copy in The Cloud.
Plex itself runs on an Ubuntu VM, housed on the R720 server in the garage with ESXi as a hypervisor. I spoke of the R720s in plural previously and singularly here because I lost the RAID controller on one of them a few weeks ago. While there are a lot of things I can learn from this lab, managing RAID arrays isn't one of the ones I'm interested in, so that machine has met the end of its days. I consolidated to the remaining one, thanks to some clever NFS mounts, and it's been fine since. Except for the part where it wasn't, and I'd been wanting to simplify back to my Plex journey roots on an Intel NUC. That NUC has been sitting here for a few days, getting provisioned, but waiting on the trigger to be pulled while I figured the best way to migrate. The plan was vCenter, but like all plans, the best of them often go arwy.
This morning, my Plex monitor bot alerted telling me the server had gone offline. The monitor is a separate VM on the same host, and historically hasn't been a stable monitor anyway, so I was skeptical the alert was real. vCenter, neatly, has a remote console in-app, but it was having trouble connecting. This is where the panic sets in. Things are going unresponsive, so I reboot the host. Summarizing the rest of the morning of frustration, one of the disks was reporting failed, which messes up a striped RAID array - which I knew, actually, and chose striped for speed because RAID is not backup, and I had that covered elsewhere. Or so I thought. Several more reboots and a big dose of luck, and I was able to get the disk back in the array, returning all the disks to visible and healthy, and my host booting successfully. There was an interlude where ESXi came up but all the VMs were in a bad state, which is when I realized that the data store hadn't mounted.
It was a long weekend. On the upshot, I had all the things I needed to start rebuilding, and did learn my way around a Dell PERC a little, but most of this recovery was luck, and I hate that. I also had a reason to start writing a few Ansible playbooks since I'd been wanting to learn that as well. It's the way forward for me - Chef's overhead is just more than I need these days. And nothing forces progress like necessity, so instead of going back and forth on the best way to migrate the VM, it was just time to do it.
I really do enjoy playing with the homelab, but like everything, it's not fun when it's broken. My setup has changed several times since I started. The earliest days were a single NUC running Ubuntu, which was fine until I learned what transcodes were. I've stuck with Docker, but changed how I manage it from manually (
docker run) to writing Chef cookbooks, to now simplifying back to a single
docker-compose file - and back to a NUC, since newer-gen Intel chips with QuickSync can handle transcode better, an ironic non-issue since most of my users direct stream now.
Long reboot cycles game me a lot of time to think about the aspects of this that I enjoy, and the ones I don't. I love a puzzle, but I don't love a fire. Despite Plex being a hobby, I'd be pretty heartbroken if I'd have lost it today and had to rebuild. I get a kick out of sysadmin work, but because of the automation and scale of it all (even when we're talking about two node clusters in my garage). I love making stuff. Writing code to see things come to life is the real bread and butter for me, and rebuilding an ESXi box and writing backup scripts isn't that. This isn't to say I'll stop, but I am going to simplify and solidify so it "just works" and I don't have to worry about it, or can fix it in a snap with a script when I do have to worry about it. There's also a strong resemblance to my time as an Android user, where I spent more cycles fidgeting with the device than just using it. I lost a lot of text messages and pictures to new SMS clients and ROMs, where with iOS, I just use the device and focus my interests on making new, not tweaking existing. I want my homelab to be this way. Fun to set up, mindless to run.
I'll write more about it when the dust settles, but this was a nice, comforting way to spend the time processing while things copy around and come back up. And, it helps met have most posts here, which I want.