Personal AI ops — a multi-agent operating partner

In one paragraph

The hardest part of running infrastructure alone is not standing it up — it is the upkeep that never carries a deadline. Two on-premise hypervisors and a public-cloud VPS host roughly three dozen services for a household: internal apps behind reverse proxies, a self-hosted PBX, a redundant LAN-DNS layer, public-internet exposure for some of it. Keeping that working is one job; keeping it patched, backed up, and drift-free is a second one that competes with a day job and usually loses. This case is the operating pattern that closes the gap — a long-context agent as persistent operating partner, a homelab-ops repository holding every config and every rationale, a hardened bastion that owns the keys to everything else, and the unglamorous maintenance — weekly container upgrades, CVE triage, database major-version bumps — run on a schedule with backup and auto-rollback rather than as quiet-evening heroics. Sixty-one days in. Two hundred commits, each with a reason.

0 days

from first commit

0 commits

each with a rationale

0 services

across 8 hosts

0 nights / wk

staggered auto-updates, with rollback

The attention cost this pattern eliminates

Personal infrastructure ages badly when it costs attention to maintain. A working sysadmin can hand-write any config from memory; what they cannot do is simultaneously hold the catalogue of things to patch and check this week whilst running their day job — so the catalogue gets dropped, images drift months behind, and the longer the gap the harder it is to re-mount. This pattern moves the catalogue out of the operator’s head and into a place a long-context agent re-reads every session, then puts the routine upkeep on a schedule that backs itself up and rolls itself back.

how the maintenance loop changes 61 days, one operator

without the operating partner

container images drift months behind — upgrading by hand is a whole-evening job
a CVE scanner reports ~1,900 findings and not one of them gets actioned
a database major-version bump is a heart-in-mouth ritual, so it is deferred for years
secrets accumulate in committed compose files
recovery procedures get reinvented mid-incident

with the operating partner

stacks upgraded on a staggered weekly schedule, each run backed up and auto-rolled-back on failure
the CVE flood triaged by exposure and fixability — 1,886 raw findings down to the three images worth touching
a major bump runs the same checklist every time: dump → compat-check → apply → watch logs → verify → keep the dump
every secret moved out of git, baseline-diffed weekly
runbooks for the operations that already broke once

What makes this an operating pattern

Two architectural choices, held since the first week, turn an ordinary “agent window open whilst I work” into something safe enough to run unattended on a schedule. A single SSH key on the laptop reaches exactly one host — the bastion LXC — which owns the per-host keys to everything else, so a repository leak means a bastion-only foothold rather than fleet-wide compromise; and every host has a narrow luna user with a checked-in sudoers whitelist, so the agent operates inside a boundary the operator’s own account keeps the dangerous operations outside of, with a daily diff making drift visible the next morning. That boundary is precisely what makes it acceptable for a weekly cron to restart a thirteen-container stack or rebuild a database at 02:00 without a human watching.

Every operational config — sudoers, deploy keys, unattended-upgrades, container profiles, reverse-proxy routes, LAN-DNS records — has a reference file in the repo with a daily drift check beside it, and the agent always reads CLAUDE.md, the relevant memory file, and the target host’s reference before proposing a change: the catalogue of “the LAN DNS records live in the primary config’s hosts array, not in the auxiliary list — getting that wrong silently overwrites them on the next reload” and “this guest’s sudo-rs does not support wildcard subcommand patterns” lives outside any single session, and the agent re-mounts it on every new one.

where the keys live three obvious options · only one was chosen

criterion	keys on the laptop (one-hop)	keys on bastion (two-hop) chosen	keys per-host (agent inside)
blast radius if the repo leaks	the full fleet	bastion only	no repo on host
works from a fresh laptop	needs key re-issue	yes — one key to bastion	host-bound
audit of what the agent ran	scattered logs	central in bastion logs	scattered
destructive op gates	agent has sudo	luna whitelist + reference	varies per host
cron runs when the laptop is asleep	no	yes	yes

Blast-radius simulator

Click a key-placement option to see what an attacker gains if the laptop is compromised tomorrow. The decision table above is the deliberation; this is the consequence.

runs locally · 0 network calls · deterministic

A session in the room

Three sessions from the past fortnight, run consecutively: building an unattended weekly upgrade for a thirteen-container backend stack, triaging a CVE scan that reports nearly two thousand findings, and taking a production database through a major-version bump. Each one shows the agent closing its own loop — investigate, decide, execute, verify, report — and every scenario ends on a success line.

A quick word before the work

Before any operation, the greeting. Two files load on every session: CLAUDE.md describes the agent’s identity and the standard operating procedures (“careful sysadmin, backs up before destructive ops, asks before anything it cannot reverse, speaks Ukrainian by default”) and the memory/* directory is the file-by-file catalogue of gotchas accumulated across earlier sessions. The greeting itself is unremarkable; what matters is that everything below starts with the agent having read both.

Three exchanges follow, all from the maintenance end of the work — the unglamorous half that competes with a day job and usually loses. The agent authors an unattended weekly upgrade for a coordinated thirteen-container stack; triages a CVE scan whose raw number would paralyse anyone; and walks a production database through a major-version bump without the site noticing. Each one shows the agent closing its own loop: investigate, decide, execute, verify, report.

The trust property worth naming up front is what makes the rest publishable: the agent reaches the homelab only through the bastion LXC. The per-host SSH keys live there and nowhere else, so if this terminal session were captured no host keys would leak with it — which is also why a cron is allowed to run these upgrades while everyone is asleep.

An unattended upgrade for a stack that moves as one

A coordinated multi-container stack is the single hardest thing to keep current by hand. You cannot cherry-pick image tags — the auth service, the database, the storage layer, the studio and the gateway ship as one upstream release, and bumping any one of them out of step breaks the set. So the upgrade is all-or-nothing, which is exactly the shape of job a tired operator defers for another fortnight.

The agent’s design earns its keep in the part most people skip. It backs up first — a logical dump plus a volume tarball — then drives the systemd units rather than the stack’s start script, because under a cron there is no interactive password and the naive path silently fails half-way and leaves two stacks down. It promotes the new version only after all thirteen containers report healthy and the studio endpoint answers 200; any failure trips the rescue branch back to the last-known-good commit. That envelope — back up, attempt, health-gate, auto-rollback — is the whole game.

The business meaning is that a thirteen-container stack now stays current without a human babysitting a 02:00 restart, and the one time an upgrade did fail it self-recovered in minutes instead of becoming a morning-after incident. The failure is not hypothetical: the first real run of an earlier version tripped exactly the cron-sudo trap above, which is why the shape shown here is health-gated and systemd-driven rather than trusting. [devil's-advocate]

And because the playbook is checked in, the next coordinated stack on the estate inherits the same envelope for the price of a new vars file. The pattern generalises: back-up-then-health-gate-then-promote is the same discipline a client would want around every unattended change, debugged here where the blast radius is one household.

Turning a CVE flood into three decisions

A vulnerability scanner’s raw output is worse than useless — it is actively misleading. Nearly two thousand findings across the estate’s container images reads as an emergency, and the rational human response to an emergency you cannot triage is to look away. A number that produces paralysis has produced less security than no number at all.

The agent’s job is to turn that flood into a short list of decisions, and it triages on two axes: is the image reachable, and can the CVE actually be fixed. Dropping everything with no published fix, then weighting what remains by exposure — public VPS images held to a far stricter bar than internal-only images behind the mesh and proxy auth — collapses 1,886 raw findings to three images genuinely worth touching tonight.

The discipline that keeps next week’s scan honest is recording why the rest are left. The bulk is upstream-pinned base-image churn on coordinated stacks; patching it would break the release, so it is logged as accept-with-reason rather than silently ignored. Next Saturday’s report does not re-litigate it — it surfaces only what changed. [crossref]

The business meaning is that the weekly report carries signal instead of noise. The operator patches three things, not twenty-eight, and can say in one sentence why the other twenty-five are safe to leave — which is the difference between a security posture and a wall of red anyone with sense would learn to scroll past.

A database major bump the site never noticed

A database major-version bump is the operation people defer for years, and for a sound reason: it carries the highest blast radius of any routine task, and the failure mode is a client’s site returning 500 to every visitor. “Carefully”, here, does not mean courage — it means a fixed checklist that runs the same way every time, so the variable is never the operator’s nerve.

The checklist is the whole value: snapshot the VPS to the backup server, take a logical dump, swap the image tag, recreate only the database container, watch the engine come up, verify the live application, and keep both the dump and the snapshot as a one-command rollback. The agent runs that sequence identically whether it is a patch bump or a major one — the discipline does not flex with the stakes.

The step a hurried human skips is watching the engine start. The agent tails the startup log and reads it before declaring anything — a clean “ready for connections” versus a crash-recovery loop or a failed-table list is the difference between a promote and a rollback, and it is visible in the logs a full minute before any visitor would see a slow page. Only once the live site returns 200 and a database-backed page renders does it promote.

The business meaning is that an upgrade a freelancer would quote a maintenance window for becomes a scheduled, observed, reversible operation the client’s site sailed straight through. The owner’s only evidence it happened is a line in the changelog and a dump retained for seven days, just in case. [future-max]

The estate, the bastion, the maintenance machinery

Two on-premise hypervisor nodes plus a public-cloud VPS hosting four public sites, ~38 services across eight hosts, a self-hosted PBX with SIP trunking and an IP handset, a redundant LAN-DNS layer, an outbound tunnel for selective internal exposure, and an on-premise backup server snapshotting the hypervisor guests plus the public-cloud VPS.

the repository the source of truth · 200 commits · 61 days

● 01

CLAUDE.md

agent identity + SOPs
● 02

runbooks

add/remove/upgrade service · rotate secrets · restore · DB bump
● 03

9 sudo whitelists

per-host · drift-pinged on the notify channel daily
● 04

baselines

SUID + cron + open ports · diffed daily
● 05

reference configs

reverse proxy · LAN DNS · unattended-upgrades · containers

the bastion — the agent's eyes and its boundary a thin LXC · always-on · the operating surface

● 01

SSH hub

laptop → bastion → every host · keys nowhere else
● 02

14 security checks

sudoers · SSH · failed-auth · SUID · certs · daily
● 03

~25 health checks

disk · RAM · service · backup · mesh · quorum · daily
● 04

weekly CVE scan

Trivy · exposure-tiered · fixable-only · Saturday report
● 05

notify webhook

alert on issues · silent on green

the maintenance machinery the upkeep, on a schedule

● 01

5-night auto-update

staggered across hosts · backup + rollback per run
● 02

weekly stack upgrade

13-container backend stack · pg_dumpall + health-gate + rollback
● 03

deploy / remove-service

Ansible · backup → proxy → DNS ×2 → verify, and its reverse
● 04

upgrade-container.yml

Ansible · pinned-tag bumps with a pre-backup snapshot
◐ 05

DB major bumps

dump → compat-check → apply → watch logs → verify · one at a time

bastion cron · drift-clean days last 30 mornings

93% green

green 28 flaky 1 red 1

Twenty-eight clean mornings out of thirty across the estate. The one red day is a genuine sudoers-drift catch — a wildcard podman exec rule added during a late-night debugging session and never reconciled, flagged by the cron and revoked the next session; the one warn day is a transient certificate-check timeout on the public VPS that cleared on the next pass. The cron’s job is not to be quiet — its job is to make the unquiet mornings impossible to miss.

Planned

○ 01

network gateway + dual-WAN

L3 stateful firewall + IDS/IPS appliance · replaces the manual WAN switch
○ 02

VLAN segmentation

single LAN segment today · isolate the Windows host + IoT
○ 03

self-hosted photos

library + face/CLIP search on a guest · blocked on storage migration
○ 04

remaining DB major bumps

Postgres 15/17 → 18 + the vector store · scheduled one at a time
○ 05

offsite backups

NAS Samba target first · cloud-object copy later

The verdict from the room

Today’s pattern is mostly reactive — Max names the job, the agent runs it end to end. But the jobs themselves have shifted from firefighting to upkeep: the most valuable sessions now are the boring ones — a weekly upgrade authored once and left to run, a CVE flood triaged down to three real items, a database major bump the client’s site never noticed. The arc the project is bending toward is proactive: the bastion cron already watches state continuously and raises the alerts; the next step is the agent opening the investigation itself, proposing the fix, and waiting only for a yes. The homelab is the low-blast-radius testbed where that loop is being closed before it generalises to client estates an order of magnitude larger.

61 days. One operator. 200 commits, each with a reason. ~38 services, 8 hosts. The upkeep that used to need a quiet evening now runs on a schedule — backed up, health-gated, reversible.

This case describes a personal infrastructure run as an ops engagement. Host roles and vendor categories are named functionally throughout; specific products, hostnames, and hardware are abstracted because the case sells the operating pattern, not the procurement choices.