In one paragraph
A product team needed a weekly industry-news digest in their chat channel — two audiences inside the same team (marketing for content, engineering for market context), one shared signal. Ad-hoc queries to consumer AI tools only happen when someone thinks to ask — useful occasionally, never a steady source of market context. This swarm replaces occasional with scheduled: a containerised agent on a single VM, every Monday morning, collects from search APIs and RSS, deduplicates three ways (cheapest filter first, most expensive last), ranks with an LLM, summarises the top items in two languages, and posts them as a structured message in the team’s chat channel. Three days of build; seven test runs to polish the output; one agent ready to deliver and the second already designed — adding it is a YAML file, not a code change.
The problem in business terms
The product team has two audiences that both need industry news, for different reasons. Marketing needs fresh facts to feed content. Engineering and product need market context — M&A moves, competitor releases, regulatory shifts, weather and climate signals that affect the business. Before this swarm, context arrived occasionally — when someone happened to need an answer that day and ran an ad-hoc query, or when a colleague happened to forward something useful that week. There was no steady cadence; the team moved with whoever was paying attention.
The swarm replaces that with a structured weekly delivery in the chat channel the team already lives in. Every Monday at 10:00 local time, a single message arrives with the top items, summarised in two languages, tagged by country, with a one-line “why this matters” per item. Nobody has to remember to run anything. Nobody has to forward links.
The non-obvious move is that the system is built multi-tenant from day one. A second team can be added with one YAML file — their own topic list, their own countries, their own chat channel, their own delivery cadence. No code change, no fork, no parallel deployment. The same runtime serves them.
- context arrives only when someone thinks to ask for it — never on a steady cadence
- coverage depends on whoever happened to forward something useful this week
- no record of what was surveyed, ranked, or chosen — every digest a one-off
- adding a second team's digest means duplicating the whole effort
- no measurement of cost, no measurement of recall, no audit
- the digest arrives in chat every Monday at 10:00, without anyone running anything
- coverage is driven by a per-tenant config — countries, topics, sources, owner
- every run leaves a structured record: discovery → dedup → rank → summary → deliver
- adding a second team is one YAML file plus one chat webhook — zero code
- ops channel reports failure, silent partial-failure, and weekly cost per run
What makes this different from “a script that posts links to a channel”
A weekend project can scrape RSS and post links. What separates that from something the team can rely on for years is a small set of structural decisions that compound. Four of them carry most of the weight here.
Multi-tenant from day one — one codebase, N YAML configs.
The most valuable structural decision — and the reason adding the next team is cheap — is that the agent is a thin runtime parameterised by config. A per-tenant YAML file declares everything that varies between teams: which topics to include and exclude, which countries are P1 vs P2, which sources are always-include and which are blacklisted, the cron schedule, the chat webhook, the owner, the model parameters, even the in-context ranking examples.
A new tenant is created by adding one file to configs/ and one service
to docker-compose.yml. The agent’s row keys in the database are scoped
by agent_id, so two tenants share the same Postgres without ever seeing
each other’s data. The embedding model is loaded once on the host and
shared by every tenant — N agents, one ML cost.
| criterion | per-team fork (typical) | one codebase + N YAML (this swarm) chosen |
|---|---|---|
| adding a new team | fork the repo · diverge over time | new YAML + new service |
| shared bug fix | cherry-pick across N forks | one commit, all tenants benefit |
| shared ML cost | N copies of the embedder | one embedder · shared volume |
| schedules per team | — | independent cron per YAML |
| data isolation | separate database per fork | agent_id FK · same Postgres |
| config drift over time | high · merges get harder | structural · per-tenant by design |
For the business: the cost of serving the second team is the cost of writing its YAML file. No engineer-week per new team, no parallel deployment to maintain, no slow drift between forks.
Three-level dedup — cheap filter first, expensive filter last.
The same news story shows up in four publications, three of them rewriting each other. Ranking every duplicate through the LLM is wasted money; ignoring duplicates dilutes the digest. The swarm runs a three-level dedup pipeline before any LLM call:
- ● 01L1 — URL hashexact-URL collision · milliseconds
- ● 02L2 — content hashSHA-256 of normalised text · cheap text match
- ● 03L3 — vector cosinemultilingual e5-base · pgvector · threshold 0.90
The threshold (0.90) wasn’t picked by feel. It came out of a sweep on a 288-article fresh-state set — a one-off tuner that prints the unique-count at each threshold value and lets the operator pick the inflection point. That tuner is checked into the repo as a CLI command for the next operator who tunes it.
For the business: the LLM ranking budget is spent on signal, not on near-duplicates. A back-of-envelope reading of the funnel — roughly 287 candidates per run reduced to about 10 delivered items — is what makes the weekly cost report come out small enough to be unremarkable.
Idempotent stages — every run is restartable.
The pipeline has six stages: discovery, dedup, ranking, summarisation,
delivery, ops post. Each stage reads articles from the database in a
specific status, processes them, and writes them back in a new
status. A run can be resumed at any stage with a flag —
--skip-discovery --skip-dedup skips the work already done — and there’s
nothing in memory that survives across stages.
The scheduler itself spawns each run as a subprocess, not in-process, so a single bad run can’t take down the scheduler. The container healthcheck stays green; next Monday’s run starts on schedule.
For the business: a transient upstream blip never costs a week. The operator can rerun from the failing stage with one command and the digest still ships before the team’s Monday standup.
The ops channel — three message types, none of them ever a surprise.
Production AI systems fail in two ways: loud failure (the subprocess crashes; everyone notices) and silent failure (the run completes but the digest is half-empty, or noticeably off-topic). Both need to surface before the audience sees the broken delivery.
The swarm posts to a separate ops chat channel — never the digest channel — with three message types:
| criterion | failure | silent partial | weekly cost |
|---|---|---|---|
| what it means | subprocess exited non-zero | run ok · delivered count below ½ of expected | scheduled report after each run |
| trigger | exit code ≠ 0 | ranked < 0.9 × unique OR summarised < 0.8 × top-N | every run |
| what's in the message | traceback + run id | funnel numbers + diagnosis hint | funnel + token spend + estimated USD |
| audience reaction | fix before next Monday | investigate · maybe rerun · maybe tune | read · file |
For the business: the owner of the digest can sleep through Sunday because the agent will tell them if something is off. The weekly cost message turns “what does this AI thing actually cost us?” from a quarterly audit question into a routine line in chat.
The pipeline, end to end
- ● 01discoverysearch API + Google News RSS · URL-filter
- ● 02dedupURL hash → content hash → vector cosine (0.90)
- ● 03rankingGemini Flash-Lite · batch=30 · JSON-mode · 5 scoring rules
- ● 04summarisetrafilatura → markdown → snippet · top-N · EN + UK native
- ● 05deliveryBlock Kit · per-country sections · chat webhook
- ● 06ops postdiagnose() → alert · estimate_cost() → weekly report
Two pipeline choices are worth a second look:
- Full-text extraction happens after ranking, not before. Article extraction is the slow, brittle part — different sites need different fallbacks (Trafilatura, a JS-rendering fallback, and a paywall-snippet fallback). Running it on every discovered URL would multiply HTTP traffic by ~30×. Running it only on the top-N items (max 10) keeps the bandwidth bill flat.
- Country classification happens at the summary stage, with full text in hand — not at discovery. Discovery only sees the search-query context, which is famously unreliable about which country the article is actually about (a query about Canadian markets routinely returns Argentine news). The LLM sees the full text and reclassifies; the previous label is ignored.
Both choices show the same instinct: do the cheap thing on everything, do the expensive thing only on the things that survive.
Observability — three signals, all routed through tools the team already runs
The swarm publishes everything through surfaces the company already operates:
- /healthz — a 200 if the scheduler is alive. The container’s healthcheck reads this.
- /metrics — a JSON object: last run timestamp, last run status, last delivered count, 7-day run total, 7-day failure count. Computed on demand from the database — so the value survives a restart, and any tool that can parse JSON can read it.
- the ops chat channel — the three message types above, posted by the agent itself.
Structured logs go to stdout in JSON; the company’s existing log aggregator picks them up without a line of code on our side. No new metrics stack to operate, no new dashboard to build. When the production load grows — many tenants, many runs per day — the JSON endpoint is already in the shape a metrics scraper expects.
What’s shipped
Built and ready to deliver: one agent (the one prepared to run the regular Monday digest), one codebase, one container image (Python 3.11-slim with a CPU-only inference wheel — 512MB), one Postgres instance shared across all future agents, one scheduler running on a single VM, two chat webhooks (digest + ops).
Tested: 159 tests, green on every push to main. Mix of unit tests
against a fake LLM client, integration tests against a real Postgres via
a session-rollback fixture, and recorded-cassette tests against the search
APIs.
Designed but not yet running: the second agent, with its own YAML config and chat channel. The work to add it is mechanical — the runtime is already multi-tenant; only the per-tenant decisions remain.
What this says about the builder
The shape of this case isn’t “someone wired an LLM to a news feed.” It’s “someone built a runtime in which adding the next news feed is a configuration change.” That’s a different instinct — closer to platform engineering than to script-writing.
Each architectural decision shows up again in the next one. The agent is parameterised by config; the database keys are scoped by agent; the embedder is shared across tenants; the ops channel is separate from the digest channel; the cost report is a stage in the pipeline, not an afterthought. By the time you’ve read the YAML file, you’ve read the whole system.
The discipline shows in the small things too: a threshold tuner shipped
as a CLI rather than a notebook, an /metrics endpoint that computes from
the database instead of holding a counter in memory, country classification
moved to the only stage where the data exists to do it correctly, full-text
extraction held back until ranking already proved the article worth the
HTTP call. None of those choices showed up on day one — they were the
output of seven test runs and one operator paying attention.
Three days. One builder. One agent ready to deliver, one designed. A runtime that costs the same whether it serves one team or ten.
This case describes architecture and patterns. Specific vendors, the client, the agent’s topic domain, and the receiving team are deliberately left abstract; what matters here is the shape of the multi-tenant runtime and the production discipline around it. The codename, the client, and any URL that would identify either are out of scope by policy.