Sales-intel agent for a B2B sales team

A chat-triggered agent that turns a 30–60-minute hand-built contact list into a one-line request and a 1–3-minute wait. Seven months of iteration; ~1,000 records per week delivered to the sales team.

build time
7 months iteration · production since May 2026
first published
last updated

In one paragraph

A small sales team needed hand-built contact lists for outbound — scoped by company, geography, seniority, and role. The old loop took 30–60 minutes per request and only one person on the team was fluent enough across the source tools to do it well. This agent compresses that into a one-line message in the team’s existing chat channel and a 1–3 minute wait. The salesperson asks in plain English; the agent picks the right data sources, runs the lookups in parallel, deduplicates against the CRM, and replies with a ready-to-use spreadsheet. About a thousand contact records a week, delivered to a sales team of 3–5. Seven months of iteration through three generations to get the shape right.

0 months
in iteration · v1 → v3
~0 records / week
delivered to sales
0 %
stability · last full week
0 operators
investigate in parallel

The problem in business terms

The sales team builds outbound lists by hand. Each list needs a specific filter — a city, an industry, a seniority band, a role, a function. The pre-agent workflow was: open the contact database, build a search, export to CSV, dedupe against the CRM by hand, hop to professional-network search for missing emails, paste it all into a spreadsheet, share the link, repeat. A 30–60 minute round-trip per request, multiple browser tabs, and the only person who could do it well was the one who knew which tool mapped to which question.

The agent compresses the loop into a one-line chat message. The salesperson asks something like “find me innovation-team decision-makers at target account X, with LinkedIn URL and email.” The agent acknowledges in the same thread, picks the data sources, runs the lookups in parallel, deduplicates, writes the results into a freshly spawned spreadsheet, and posts the link back. End to end: a minute or three.

What gets removed from the daily critical path is the procurement skill — nobody on the sales team needs to remember which database covers which geography or which surface has up-to-date emails. The agent knows.

how a contact-list request runs before vs after the agent
without the agent
  • salesperson opens the contact database, builds a search, exports a CSV
  • deduplicates the CSV against the CRM by hand
  • hops to professional-network search for missing emails
  • 30–60 minutes per request; only one teammate is fluent across the tools
  • every list is a one-off — no record of what was asked or delivered
with the agent
  • salesperson sends a one-line request in their existing chat channel
  • three specialist operators investigate in parallel across the sources
  • deduplication happens automatically against the CRM and prior lists
  • 1–3 minutes end to end; anyone in the channel can do it
  • every run leaves a spreadsheet artefact and a structured execution record

What makes this different from “a chatbot wired to a few APIs”

Most agent demos collapse under their own tool surface: hand the LLM twelve flat tools, watch it pick the wrong one under load, watch it loop, watch it fail. This agent is built on four decisions that turn “the model has access to our contact sources” into “the team can rely on it without watching.”

Three operators, each with its own context. The main agent only routes.

The most valuable structural decision — and the reason the agent doesn’t collapse — is that the data sources aren’t exposed as a flat list to the main model. They’re grouped into three specialist operators:

  • a request operator — owns the contact database, the CRM, and web research
  • a public-profile operator — owns the long-running professional-network scraper
  • a spreadsheet operator — owns spreadsheet creation and row append

Each operator has its own model context and its own narrow tool surface. The main agent only chooses which operator to call. The planning prompt stays short, the model never accidentally writes to a spreadsheet in the middle of a research turn, and adding a new data source means writing a new operator rather than expanding a flat list everyone has to re-read.

flat tools vs operator-per-domain why three operators beat twelve tools
criterion flat tool list (typical) operator-per-domain (this agent) chosen
main agent's tool count ~12 flat tools 3 operators
context per tool decision full conversation operator's narrow scope
wrong-tool risk under load high structural · near-zero
adding a new source edit main prompt new operator, isolated
parallel investigation sequential only natural per-operator

For the business: predictable latency, fewer wrong-tool calls, and a clean place to plug the next data source in.

Every model context has a fallback.

Every operator and the main agent is configured with two models — a sharper but flakier primary, and a steadier fallback. Eight model nodes wired into four contexts. The primary is faster and better when it works; the fallback absorbs the provider-side hiccups that bleeding-edge models invariably ship with.

This is a deliberate resilience-over-capability trade. A slightly less clever second-tier answer is better than a hard failure surfaced to a salesperson in chat. The 4% error rate in the most recent full week is what’s left over after this pattern has absorbed every provider outage that hit production.

Long-running scrapes are pushed into a sub-workflow.

A public-profile scrape can take anywhere from 30 seconds to several minutes — the upstream service does the work asynchronously. Rather than block the main agent for that time, a dedicated scrape-and-wait sub-workflow owns the polling loop entirely: submit task → wait → check status → loop up to five minutes → return either a parsed result, a failure payload, or a still-running task ID. The main agent calls it like any other tool.

For the salesperson: a consistent “wait, then spreadsheet” experience whether the underlying source is fast or slow. The 5-minute cap means a stuck scrape never silently consumes the session — it surfaces as a clean failure the operator can retry.

The interface is the chat channel they already use.

There is no dashboard, no portal, no separate auth. Permissions are inherited from chat-channel membership. Onboarding a new salesperson is /invite to the channel — and that is the training: type what you want, read the spreadsheet that comes back.

The non-obvious consequence is observability. Of 140 lifetime executions, 138 were triggered by humans on the team; only 2 were manual test runs by the maintainer. The agent isn’t a demo that someone keeps alive by poking it — it’s load-bearing for the sales week.

The three operators

request operator contact database + CRM + web research
  1. 01
    contact-db query
    by company · geography · seniority · role
  2. 02
    CRM cross-check
    drop contacts the team already owns
  3. 03
    web research
    fill missing emails · verify company info
public-profile operator long-poll scraper · scrape-and-wait sub-workflow
  1. 01
    submit scrape
    queue task in external service
  2. 02
    poll status
    30s interval · 5-minute cap
  3. 03
    parse + clean
    BOM-strip · column normalisation
spreadsheet operator output surface
  1. 01
    spawn sheet
    freshly created · channel permissions inherited
  2. 02
    append rows
    name · title · profile URL · email · location · company
  3. 03
    reply
    shareable link posted in original chat thread

The stability story

The architecture pays for itself in the production numbers. Two consecutive weeks:

week ofrunssuccesserrorsuccess rate
2026-05-1183721187%
2026-05-185351296%

A 3.5× drop in error rate week-over-week. It didn’t come from the upstream APIs getting better. It came from rolling the dual-model fallback pattern out across all four model contexts — the failure class that dominated the earlier week (provider hiccups on the bleeding-edge primary model) got absorbed at the architecture level instead of surfacing to the salesperson.

The remaining errors cluster narrowly: most are bad-request edge cases on specific contact-database search parameters. That’s a knob to turn, not a structural problem.

Audit, by default

Because the agent runs on a workflow runtime, every execution leaves a record in that runtime’s execution log: which operator ran, with which parameters, with what output, with how long it took, with success or failure. There’s nothing to instrument — the runtime captures it for free.

That’s how the weekly stability numbers above were computed: by querying the runtime’s execution history for the 140 lifetime runs of the production workflow, scoring each as success or error, computing the rates.

For the business: the same record that exists for debugging doubles as the record for audit. Sales, finance, and security can all answer “what did the agent do last Tuesday for the LATAM list?” from one place, without anyone having to remember to log it.

What’s shipped

In production: the main workflow (30 nodes), a data-appender helper, a long-poll scrape-and-wait sub-workflow (17 nodes, added 2026-05-21), and a paired regression-test workflow.

Kept as references, deactivated: v1 (“Archive”) and v2 (“playground”), left in the runtime as deactivated workflows rather than deleted. The diff between generations stays legible months later — when someone (including future-me) asks “why did we move away from X?” the answer lives in the workflow next door, not in someone’s memory.

Separate dev fork: a wizleads-dev twin used for safer experimentation with new integrations before they touch the production graph.

About fifty nodes across the production graph in total.

What this says about the builder

The interesting part isn’t that the agent exists — agentic workflows on this kind of runtime are common enough that templates exist for them. The interesting part is the shape: orchestration as nested agents with disjoint vocabularies, not as a flat tool list. A main planner that only knows operators. Operators that only know their own domain. The same instinct an experienced service-mesh architect applies — narrow contracts, push specifics down, keep the top of the stack legible.

The stability work shows the same discipline. Rather than chase 100% by adding retries everywhere, the builder picked one failure class (provider flakiness on bleeding-edge models) and absorbed it architecturally with a fallback at every model context. The error rate fell because the surface area where one provider hiccup could derail a run got smaller — not because individual integrations got better.

And the practical tells: BOM-stripping CSVs from the scraper, polling loops kept inside their sub-workflow, kept-for-reference legacy versions, a paired test workflow, a separate dev fork. The artefacts of someone who has shipped enough automation to know which corners eat you later.

Seven months. One builder. ~50 nodes across the production graph. Roughly a thousand contact records a week, delivered to a sales team that previously did the work by hand.


This case describes architecture and patterns. Specific vendors, hostnames, and the client itself are deliberately left abstract; what matters here is the shape of the system, not the procurement choices.