Slack-native sales-intel agent for a B2B sales team

In one paragraph

A five-person sales team used to build its outbound contact lists by hand — scoped by company, geography, seniority, and role — at thirty to sixty minutes a list, and only one person was fluent enough across the source tools to do it well. This agent turns that into a one-line message in the team’s Slack channel and a minute-or-three wait: the salesperson asks in plain English, the agent picks the data sources, runs the lookups, deduplicates against the CRM, and replies in-thread with a ready-to-use spreadsheet. About a thousand contact records a week reach a team that previously did the work by hand. Eight months of iteration through three generations got the shape right — and, in getting it right, ran a workflow runtime to the genuine edge of what it can carry.

0 months

iteration · v1 → v3, three generations

~0 records / week

delivered to a 5-person sales team

specialist agents behind one chat box

models per agent · primary + fallback

The problem in business terms

Outbound at a small sales team is throttled by a procurement skill, not by the data: someone has to remember which database covers which geography, which surface still has live emails, and how to dedupe the result against the deals the team already owns — then do it again for every list. The drafting of the search is the smallest slice; the tool-hopping and the by-hand reconciliation eat the half-hour. When that skill lives in one person’s head, the team’s outbound throughput is capped at that person’s spare attention.

how a contact-list request runs before vs after the agent

without the agent

Salesperson builds a search in the contact database, exports a CSV.
Deduplicates against the CRM by hand, account by account.
Hops to professional-network search for the missing emails.
Thirty to sixty minutes a list; one teammate is fluent across the tools.
Every list is a one-off — nothing records what was asked or delivered.

with the agent

Salesperson sends one line in the Slack channel they already use.
Specialist operators search, dedupe against the CRM, and enrich in one pass.
Missing emails filled where verifiable, left blank and flagged where not.
A minute or three end to end; anyone in the channel can ask.
Every run leaves a spreadsheet and a logged execution record.

What makes this different from “a chatbot wired to a few APIs”

The agent does not hand one model a flat list of a dozen tools and hope it picks the right one under load. The contact sources are grouped into a small set of specialist operators — one owns enrichment and the CRM, one owns the long-running professional-network scraper, one owns the output spreadsheet — and the orchestrator that reads the salesperson’s message only chooses which operator to call. The planning prompt stays short, the model never writes to a spreadsheet in the middle of a research turn, and adding a source is a new operator rather than another line everyone has to re-read. Every one of those model contexts runs two models — a sharper primary and a steadier fallback — so a provider hiccup on a bleeding-edge model degrades to a slightly less-clever answer instead of a dead end surfaced to someone waiting in chat.

The second axis of the design is keeping the volume of information under control, because a lead-gen agent moves a great deal of it. Tools never let the model assemble a raw request body string by string — the body is built server-side and the model supplies only the search query, which removes an entire class of malformed-request failures. Searches are clamped server-side to a sane page size, the agent is told to stop after a few attempts rather than broaden a stubborn search into thousands of rows, and — the decision that mattered most — each operator returns only its final, consolidated leads to the orchestrator, never the raw payloads it sifted through. The first version of that last rule was absent, and a single hard-to-find target once made an operator hand the planner five megabytes of raw matches and sail straight past the model’s context ceiling. The fix was not a bigger model; it was letting the planner see less.

What the planner is allowed to see the decision that kept the orchestrator inside its context budget

criterion	Every raw provider payload flows up	Only each operator's final, consolidated leads chosen	Mid-run streamed summaries
Planner token cost per request	unbounded	bounded · small	medium
Stays inside the context window	no	yes	partial
Leads the planner can reason over	all	all	partial
Debuggable when a run is wrong	yes	yes	partial
Tokens wasted on discarded rows	every row	none	some

Per-request token budget

A lead-gen agent moves a lot of data. Set how many leads you ask for, how hard the agent searches, and how heavy each raw match is — then compare what the planner sees if every raw payload flows up versus only the consolidated leads. The gap is why one stubborn search once blew past the model's context ceiling.

runs locally · 0 network calls · deterministic

A request in the channel

Three consecutive requests from the sales channel, lightly cleaned. Vector investigates, decides, acts, and reports back in the thread — two of the three close on a success line, and the third closes on a clean, deliberate refusal. The intro on the right stays pinned as each scene picks up the next request.

Before the first request

Vector is a chat box in the sales team’s existing Slack channel that turns a one-line request into a deduplicated, honestly-sourced contact sheet in a minute or three. There is no dashboard and no separate login: membership of the channel is the permission, and typing what you want is the entire training.

Two things prime every run. A routing-prompt.md teaches the orchestrator how to read a request and which operator owns which kind of work; the operators.roster is the list of specialist operators and their narrow tool surfaces. The orchestrator never touches a data source directly — it delegates, then assembles.

The three requests that follow show the agent at the shapes that matter most: the core capability of finding and ranking decision-makers across two accounts, a follow-up that continues the same conversation into the same sheet, and a request the agent declines because honouring it would cross a platform’s terms of service.

The trust property is what makes the case publishable: contact data stays inside the company’s CRM and Google estate, the agent runs behind the corporate VPN, and every run is logged where it executes — so the record that exists for debugging doubles as the audit trail.

The core request — a ranked shortlist, deduped and honest

This is the request the agent exists for. A salesperson names two target accounts and a seniority bar; what comes back is a clean sheet of net-new names with the decision-makers on top. The skill the team no longer needs is knowing which database covers which geography or which surface still has live emails — the agent owns that.

Two quiet disciplines are visible here. The CRM is checked first and contacts the team already owns are dropped, so the sheet is only names worth acting on. And where a person’s email cannot be verified, the cell is left blank and flagged Email Status: unavailable rather than guessed.

That second discipline is load-bearing, and was learnt the hard way. A model that has seen [email protected] for three colleagues will happily invent the same shape for a fourth — and a fabricated email in an outbound sheet is worse than a blank one. The rule is enforced in the prompt and backstopped in code: provider-returned emails only, the provider’s own status copied into its own column. [devil's-advocate]

For the business: the salesperson gets a ranked, deduplicated, honestly-sourced list in the time it takes to refill a coffee, and never has to wonder whether a green-looking email was real or invented.

Continuing the conversation, into the same sheet

The follow-up that proves it is an assistant, not a one-shot form. “Add CxOs at two more accounts to that same table” carries no sheet link and no repeated filter — the agent remembers the spreadsheet it just made and the bar it just applied, and appends underneath.

The token discipline is on display. One of the new accounts is enormous, and a naive agent would broaden the search until it returned hundreds of rows. Vector caps at the C-suite, stops after three searches, and appends nine rows rather than spending a fortune to return noise. Restraint is a feature here, not a limitation.

Appending to the existing sheet rather than spawning a new one is the small thing that makes the output feel like a growing working document instead of a pile of one-off exports. The salesperson ends a two-message exchange with a single sheet covering four accounts.

For the business: continuity is what turns a lookup tool into something the team reaches for by reflex. The second request cost a fraction of the first, because most of the context was already in the room.

The request it declines — and why that is the feature

The most reassuring thing an autonomous agent can do is refuse cleanly. Asked to pull contacts for everyone on a saved Sales Navigator list, Vector says no — and explains the difference between what it will and will not do.

It can parse Sales Navigator search results — the result pages a filter produces. A saved, personal list is a different animal: honouring it would mean opening individual profiles behind the requester’s authenticated session, which crosses the platform’s terms of service. The boundary is not a missing capability the team could unlock; it is a line drawn on purpose.

Rather than dead-ending, the agent offers the compliant path — paste the search URL, or hand over the filters and it runs the search itself — so the salesperson still gets the people, by a route that keeps everyone inside the rules.

For the business: a sales tool that quietly scraped personal profiles would be a liability dressed as a convenience. An agent that knows where the line is, and routes around it without being asked, is one the whole team can be allowed to use unsupervised.

What’s shipped

The agent fleet orchestrator + 3 operators · agent-as-tool

● 01

Orchestrator

reads the request · picks an operator · never touches a source directly
● 02

Request operator

contact enrichment · CRM cross-check · web research
● 03

Spreadsheet operator

spawn from template · append rows · reply in-thread
● 04

Professional-network operator

async scrape-and-wait · Sales Navigator search results

Two sub-workflows

● 01

Data appender

auto-map append · routing keys stripped before the write
● 02

Scrape-and-wait

submit → poll 30s → 5-minute cap → parsed result or clean failure

The spreadsheet is reached by two independent write paths — the request operator’s, whose column keys are dictated by prompts, and the scraper’s, whose column keys are dictated by a deterministic code step. Both append to a copy of the same template; both have to be edited in lockstep whenever the template gains a column, or one of them silently writes into no column at all. Earlier generations are kept in the runtime as deactivated workflows rather than deleted, so the diff between generations stays legible months later, and a separate dev twin absorbs risky experiments before they touch the production graph.

Next chapter — a separate build where a workflow runtime stops being the right tool

○ 01

Standalone service · typed tool I/O

removes the malformed-request class entirely
○ 02

Entity memory

an accumulating, proprietary enrichment store — cache hits cost nothing
○ 03

Per-run cost guardrails

hard budget ceilings · cost-per-lead attribution
○ 04

Eval harness

golden-request regression tests a workflow runtime cannot offer

[future-max]

Hardened by incident

The reliability did not come from the upstream APIs getting better; it came from absorbing each real production failure at the architecture level. Three runs, three structural fixes:

the run that…	what broke	the fix that made it structural
chased a hard target	the operator broadened until it returned thousands of rows and handed the planner ~5 MB of raw matches — past the model’s 1M-token ceiling	operators return only consolidated leads, never raw payloads; page size clamped server-side; stop after three searches
trusted a pattern	the model invented three plausible emails from a colleague’s `first.last@` address and labelled them verified	provider-returned emails only — enforced in the prompt, backstopped in code; the provider’s own status copied into its own column
could not find a contact	the agent called the same enrichment tool until it hit the step-iteration cap and the whole run errored	never call the same tool twice for the same input; on the cap, hand the partial result back instead of crashing the run

Each fix narrowed the surface where one bad turn could derail a request — and none of them was a bigger model. Across two consecutive production weeks earlier in the rollout the error rate fell from 13% to 4% as the dual-model fallback reached every agent: not the integrations improving, but the failure class that dominated the bad week getting absorbed where the salesperson never sees it. When a cheaper-looking enrichment provider appeared, it was not swapped in on faith either — a live test measured its real per-lead cost against the marketing claim, the incumbent won on stability, and the alternative was left parked one connection away for the day that changes.

Because the agent runs on a workflow runtime, every execution is already logged where it runs — which operator ran, with which parameters, how long it took, success or failure. The record that exists for debugging is the same one that answers what did the agent do last Tuesday for the LATAM list? — nobody had to remember to instrument it.

The verdict from the room

What Vector settled is that a five-person sales team can have a research analyst on call in their chat channel — one that dedupes against the CRM, ranks by seniority, refuses to invent an email, and declines a request that would cross a platform’s rules, all without anyone watching. It is also, deliberately, about the most a workflow runtime should be asked to carry: four agents, two write paths edited in lockstep, and token budgets policed by hand are the marks of a system that has earned its way to a proper service. That next build is scoped as separate work; this one is the proof that the shape is right.

Eight months. One builder. Four agents behind one chat box. ~1,000 contact records a week, deduped and honestly sourced — and a clear line where the workflow runtime ends and the next build begins.

This case describes architecture and patterns. The client, its sales team, and the specific data vendors are deliberately left abstract; the prospect companies named in the demo are public firms, used only to make the interaction concrete. What matters here is the shape of the system, not the procurement choices.