Files
homey/AGENTS.md
T
2026-04-26 00:09:52 +03:00

305 lines
12 KiB
Markdown

# AGENTS.md
Self-hosted home server configuration for a Raspberry Pi 4 (8 GB), managed
entirely through NixOS. Services run as podman containers under systemd.
Remote access is via Cloudflare Tunnel; local access goes through Caddy
with Let's Encrypt TLS (DNS-01, Cloudflare API).
The original Kubernetes/Helm setup is preserved on the `main` branch.
This branch (`nixos-port`) is the active NixOS port.
---
## Project Structure
```
flake.nix # Entry point — defines all hosts
modules/
common.nix # Shared system config (nix, podman, sops, SSH)
storage.nix # External HD mount + per-service directory layout
caddy.nix # Caddy reverse proxy (DNS-01 ACME, forward_auth)
cloudflared.nix # Cloudflare Tunnel for remote access
backup.nix # Restic daily backups (S3 primary + manual offload)
services/
openldap.nix # OpenLDAP — central identity provider
authelia.nix # Authelia — SSO gateway
gitea.nix # Gitea — Git server
nextcloud.nix # Nextcloud + PostgreSQL
phpldapadmin.nix # phpLDAPadmin — LDAP web UI
jellyfin.nix # Jellyfin — media server (disabled by default)
transmission.nix # Transmission — torrent client (disabled by default)
hosts/
pi-main/
default.nix # Service selection + host-specific overrides
hardware.nix # Pi 4 boot, SD card labels, ARM platform
secrets/
.sops.yaml # Age key configuration
secrets.yaml # sops-encrypted secrets (commit only after encrypting)
PORTING.md # Step-by-step migration guide from the old Helm setup
```
## Services and URLs
All services live under `zakobar.com`.
| Service | URL | Auth |
|---------|-----|------|
| Authelia | `auth.zakobar.com` | Public (it is the auth portal) |
| Gitea | `git.zakobar.com` | Gitea-native (LDAP) |
| Nextcloud | `nextcloud.zakobar.com` | Nextcloud-native |
| phpLDAPadmin | `ldapadmin.zakobar.com` | Authelia two_factor, admins only |
| Jellyfin | `jellyfin.zakobar.com` | Jellyfin-native |
| Transmission | `torrent.zakobar.com` | Authelia two_factor, admins only |
## Networking
All containers join a private podman network named **`homey`**, created by the
`podman-homey-network` systemd service in `common.nix`. This provides:
- **DNS isolation** — containers reach each other by name (e.g. `openldap`,
`nextcloud-postgres`) without being exposed on the host network.
- **No port conflicts** — Caddy owns host ports 80/443; service containers map
only to `127.0.0.1:<port>`.
- **Defence in depth** — even if the firewall were misconfigured, services are
not bound to `0.0.0.0`.
Internal ports (all mapped to `127.0.0.1` on the host):
| Container | Host port | Container port |
|-----------|-----------|----------------|
| openldap | 389 | 389 |
| authelia | 9091 | 9091 |
| gitea | 3000 | 3000 |
| nextcloud | 8080 | 80 |
| nextcloud-postgres | 5432 | 5432 |
| phpldapadmin | 8081 | 80 |
| jellyfin | 8096 | 8096 |
| transmission | 9092 | 9091 |
Inter-container communication uses container names on the `homey` network
(e.g. authelia → `ldap://openldap:389`, nextcloud → `nextcloud-postgres:5432`).
Caddy (running on the host) proxies via `127.0.0.1:<host port>`.
## Storage Layout
All persistent data lives on the external HD at `/mnt/data/`:
```
/mnt/data/
openldap/
etc-ldap-slapd.d/ → /etc/ldap/slapd.d in container
var-lib-ldap/ → /var/lib/ldap in container
authelia/config/ → /config
gitea/data/ → /data
nextcloud/
html/ → /var/www/html
db/ → /var/lib/postgresql/data
db-dump/ → pg_dump output (pre-backup)
jellyfin/config/ → /config
media/movies|tvshows|... → shared media (read-only to jellyfin)
transmission/config/ → /config
restic-cache/ → restic local cache
```
The drive device path is set per-host in `hosts/<name>/default.nix` via
`homey.storage.device`. Use a `/dev/disk/by-id/` path for stability.
## Build / Validate Commands
```bash
# Check flake structure and evaluate all hosts (no build)
nix flake check
# Dry-run: show what would change without applying
sudo nixos-rebuild dry-activate --flake .#pi-main
# Apply configuration
sudo nixos-rebuild switch --flake .#pi-main
# Build without switching (e.g. cross-compile on workstation)
nix build .#nixosConfigurations.pi-main.config.system.build.toplevel
# Show diff between running system and new config
nvd diff /run/current-system $(nix build --no-link --print-out-paths .#nixosConfigurations.pi-main.config.system.build.toplevel)
```
## Secret Management
Secrets are managed with [sops-nix](https://github.com/Mic92/sops-nix) and
age keys. The encrypted `secrets/secrets.yaml` is committed to the repo; the
age private key lives on the Pi at `/var/lib/sops-nix/key.txt`.
```bash
# Edit secrets (decrypts, opens $EDITOR, re-encrypts on save)
sops secrets/secrets.yaml
# Encrypt a plaintext secrets.yaml for the first time
sops --encrypt --in-place secrets/secrets.yaml
# Add a new host key (after generating it on the new machine)
# 1. Add the public key to secrets/.sops.yaml
# 2. Run:
sops updatekeys secrets/secrets.yaml
# Generate a new age key on a host
age-keygen -o /var/lib/sops-nix/key.txt
age-keygen -y /var/lib/sops-nix/key.txt # print public key
```
Secrets that must come from the old deployment (see `PORTING.md` for how to
extract them from the old k8s cluster):
- `openldap/admin_password`, `openldap/config_password`, `openldap/ro_password`
- `gitea/admin_password`
- `nextcloud/admin_password`, `nextcloud/postgres_password`
Everything else (authelia JWT/session/encryption keys, gitea JWT tokens,
restic password, Cloudflare tokens) can be generated fresh.
## Code Style Guidelines
### Nix
1. **Module pattern** — every service is an opt-in module with an `enable` option:
```nix
options.homey.myservice.enable = lib.mkEnableOption "My service";
config = lib.mkIf config.homey.myservice.enable { ... };
```
2. **`homeyConfig` specialArgs** — top-level site config (domain, org name,
timezone) is passed via `specialArgs` in `flake.nix` and accessed as
`homeyConfig` in every module. Do not read domain/org from hardcoded strings.
3. **No secrets in the Nix store** — secrets are always read from sops-managed
files at runtime, never embedded in the built config. Use
`config.sops.secrets."key".path` to get the runtime path of a secret file.
4. **Secret injection pattern** — because `oci-containers` `environmentFiles`
is limited, use a `systemd ExecStartPre` script to write an ephemeral env
file at `/run/<service>-secrets.env` and reference it via `EnvironmentFile`.
Clean it up in `postStop`.
5. **`--network=homey`** — all containers join the private `homey` podman
network. Inter-container traffic uses container names as hostnames; host
access is via explicit `ports` mappings to `127.0.0.1:<port>`.
6. **Systemd ordering** — always express `after`/`requires` dependencies
explicitly. The external HD mount unit is `mnt-data.mount`; containers that
need storage must depend on it.
### Adding a New Service
1. Create `modules/services/<name>.nix` following the existing module pattern.
2. Add `homey.<name>.enable = false` as the default option.
3. Import the new module in `flake.nix` (in the `modules` list inside `mkHost`).
4. Enable it in `hosts/pi-main/default.nix`.
5. Add a Caddy virtual host block in `modules/caddy.nix`.
6. Add the service data directory to `modules/storage.nix` `tmpfiles.rules`.
7. Add the data path to the `paths` list in `modules/backup.nix`.
8. Add any new secrets to `secrets/secrets.yaml` (plaintext) and document them.
### Updating or Regenerating Secrets
```bash
# Edit the encrypted file — sops opens $EDITOR
sops secrets/secrets.yaml
# Copy updated secrets to the Pi and rebuild
rsync secrets/secrets.yaml admin@pi-main:/path/to/homey/secrets/
ssh admin@pi-main 'sudo nixos-rebuild switch --flake /path/to/homey#pi-main'
```
### Debugging Containers
```bash
# List all running containers
podman ps
# Follow logs for a service
journalctl -fu podman-authelia.service
# Drop into a running container
podman exec -it authelia sh
# Restart a single service
sudo systemctl restart podman-gitea.service
# Check why a service failed to start
systemctl status podman-openldap.service
journalctl -u podman-openldap.service --since "5 min ago"
```
---
## Outstanding TODOs
These items are known gaps that need to be addressed before the setup is
production-ready:
- [ ] **`caddy.nix` — fix `vendorHash`**: The Caddy build with the Cloudflare
DNS plugin uses `lib.fakeHash` as a placeholder. After the first `nix build`,
replace it with the hash Nix reports in the error message.
- [ ] **`hosts/pi-main/default.nix` — fill in real values**:
- SSH public key in `users.users.admin.openssh.authorizedKeys.keys`
- External HD device path in `homey.storage.device`
- Backup repository URL in `homey.backup.repository` — must be an S3-compatible
URL, e.g. `"s3:https://s3.us-west-002.backblazeb2.com/your-bucket-name"`
- [ ] **`secrets/secrets.yaml` — populate and encrypt**: Fill in all secret
values (old passwords from k8s + freshly generated ones, including
`restic/s3_access_key_id` and `restic/s3_secret_access_key`), then run
`sops --encrypt --in-place secrets/secrets.yaml` before committing.
- [x] **`secrets/.sops.yaml` — PGP key**: The encryption subkey
`076AA297579A0064` is already in `.sops.yaml`.
- [ ] **Cloudflare Tunnel**: Create the tunnel in the Zero Trust dashboard,
copy the tunnel token into secrets, and configure public hostnames. See
`modules/cloudflared.nix` and Phase 3 of `PORTING.md` for details.
- [ ] **Second machine**: When ready, add `hosts/pi-secondary/` and uncomment
the `pi-secondary` entry in `flake.nix`. Services communicating cross-machine
should reference the primary Pi's LAN IP instead of `127.0.0.1`.
- [ ] **Jellyfin and Transmission**: Both modules are written and importable
but disabled. Enable in `hosts/pi-main/default.nix` when ready:
```nix
homey.jellyfin.enable = true;
homey.transmission.enable = true;
```
- [ ] **Backup — S3 credentials**: Add `restic/s3_access_key_id` and
`restic/s3_secret_access_key` to secrets, and set `homey.backup.repository`
to your S3-compatible bucket URL in `hosts/pi-main/default.nix`.
- [ ] **Backup — offload script**: Write `scripts/offload-backup.sh` for
manually copying snapshots to a local disk (USB attached to Pi, or a disk
on your workstation). Uses `restic copy` to clone from the S3 repo into a
local restic repo on the target path. See `TODO.org` for design notes.
### Post- Pi first boot
These items require the Pi to be built, flashed, and booted at least once.
- [ ] **`secrets/.sops.yaml` — add Pi age key**: After generating the age key
on the Pi (`age-keygen -o /var/lib/sops-nix/key.txt`), add the public key
to `.sops.yaml` alongside the existing PGP key, then run
`sops updatekeys secrets/secrets.yaml`.
- [ ] **`hosts/pi-main/hardware.nix` — verify SD card labels**: The file
assumes partition labels `NIXOS_SD` (root) and `FIRMWARE` (boot). Relabel
after flashing if they differ, or update the `fileSystems` entries.
- [ ] **Gitea LDAP auth**: After first start, configure LDAP authentication
in Gitea's admin panel (Admin → Authentication Sources → Add LDAP source).
The old Helm chart had this commented out; it must be done manually once.
Relevant settings:
- Host: `127.0.0.1`, Port: `389`, Security: Unencrypted
- Bind DN: `cn=readonly,dc=zakobar,dc=com`
- User search base: `ou=users,dc=zakobar,dc=com`
- [ ] **Nextcloud LDAP app**: After restoring the Nextcloud volume, verify
the LDAP Users and Contacts app is still configured correctly
(Admin → LDAP/AD Integration).