Nix Selfhosting in the Age of LLMs

My usecase
Starting the migration
Quick success story
Why NIX is so good, even on its own
- Fully declarative
- Saving tools and commands
The extra benefit of the LLM
Words of caution
Conclusion

My usecase

I recently had to redo my homelab. Homelab might be a very strong word for what is just a Raspberry Pi 4 8GB hooked up to a 2TB storage drive. Regardless, lately it has been slugging along at unacceptable speeds. I don't know if it was system rot or just a buildup of dust, but my home run services were becoming slow. I wanted to run maintenence on it, but I was sort of afraid to touch anything from what I built.

I based all my work on Helm charts to be deployed on Kubernetes (k3s to be exact). I reasoned it was the cutting edge way to self-host, it was highly secure, and that Helm made everything declarative. In the end I found it unwieldy, and the Helm charts weren't as declarative and pure as I would have hoped. The system would still get stuck mid-deployment, previous configuration options wouldn't always go away, services needed to be manually deleted… It always felt like a great beast I was trying to wrestle into submission, which is not how I want to feel about a service backing up pictures of my children.

Starting the migration

It was time for a change. I wanted to restart my homelab and make it something very maintainable. This drove me to use two technologies that were new to me since my last attempt at self hosting:

The NIX ecosystem: I migrated my own laptop to NixOS over a year ago at this point. I migrated my work setup to Nix and now almost all of my projects use Nix to manage dependencies, builds and deployments. Now, I had an opportunity to manage my homelab using NixOS and declare everything.
LLMs and Agentic Co-Programming: A lot of the work I did in the past involved painstaking port definitions and reverse-proxy configurations. There was no reason to think this time would be different in this regard, but knowing I could partially rely on some form of coding agent made the process significantly less daunting.

Quick success story

I started to make a migration plan - I wanted to keep relevant data from my old homelab and migrate it into my new setup. Within a day I had backups ready to be ported. Within two more days I had my initial services running, albeit with some bugs. It took another day to iron it all out (keep in mind, my time spent working on this project was 10PM to 11PM only, I consider this quick progress).

Sure enough, it all worked! I backed up all my files and ran my little suite of automations within a few days! It was working faster than ever before, and I felt more in control than ever, even though I delegated 50% of the coding work to a machine. I attribute this to the fact that, since everything was strictly declarative, I could always tear it all down and rebuild it exactly the way I want very easily.

Why NIX is so good, even on its own

Fully declarative

The state of my homelab is set directly from my project. With the exception of some service specific settings that can't be set from files, everything can be changed by changing a file in the repo, and everything can be inferred from reading the repo itself. This means I am never in the dark about the state of my homelab, I can just see what should be running.

Saving tools and commands

Something I find myself doing for every one of my Nix projects is declaring a development shell with special commands for tooling. Every time I encounter myself writing long commands to do specific tasks (checking state of backups remotely, remote build and deploy, running commands inside podman hosted services), I just write a custom command that gets loaded into the shell.

In the previous version, this was accomplished by writing a bunch of bash files. This does work and acheives the same basic principle, but this feels more approachable, and the commands feel more accessible. I find myself using this tool much more within the NIX ecosystem.

The extra benefit of the LLM

Of course using an agentic coding agent helps with selfhosted and automating tasks. This is true also for a kubernetes based helm-charted repo like my previous setup. But there ARE some DEFINITE benefits of using this specific combination.

For starters, the fact that it is fully declarative means the LLM has direct access to information about the state of the homelab. It doesn't have to hunt around and search for active processes - it knows the file layout and can infer immediately which services are running and how they are configured. Compare this to bloating your context by first reading a bunch of YAML files in the repo and then running a bunch fo commands to make sure everything exists in the repo and making sure it matches the YAML.

An additional benefit came along with all the custom commands I created for myself - turns out if they're useful for me, they're useful for the LLM! This allows it to perform routine tasks repeatably and correctly, instead of guessing what the best way to do so is. Checking on statuses of backups, for example, can be done in a few ways, but since I already have a command that checks all the relevant services and returns the information together, the LLM doesn't have to guess how to check for my backups - it does so first try every time.

Words of caution

Of course LLMs did not solve all my woes. Worse, they introduced new ones.

When setting up the new homelab Claude decided to be pretty lax on security. Network isolation wasn't that important so it could just skip it, file permissions were a bit of a pain to deal with, so why bother… You get the picture.

Needless to say this was not cutting it. Once I realized this was going on, addressing it with the help of Claude was easy, but I had to first see this issues myself. Even with LLMs there is no replacement for understanding the code you have.

Another thing is the actual design of the repo itself and how it is organized. Claude can easily spaghettify a codebase if not kept in check. A disorganized codebase makes it harder for Claude itself to work, but more importantly, it makes it harder for me to understand the code I am using, which makes it harder to control Claude. It was critical for me to babysit Claude and give sensible repo design guidance so my code would be kept in check. Just because you're using AI doesn't mean you should let it make messy code.

Even with these gripes - using an LLM saved me loads of time and allowed me to make this migration even if the time I have to spend on this topic is very, very limited.

Conclusion

This is an ongoing exercise in bad sys-admining - but I am enjoying it very much. Everything so far is running smoothly, and within a short amount of time I surpassed the state my homelab was in - and not once did I have to delete and re-deploy an ingress configuration. I even managed to set up services that I gave up deploying before.

I will keep using this for now. In the end, I don't want to have to rely on the LLM for every task I want to do with my homelab, and over time I am simplifying and restructuring the codebase further to make it easily maintainable by me, a silly old human. But even now it is in a very good state, and I am optimistic about the future of my selfhosted services.

7.0 KiB Raw Blame History