All case studies
Ops · SaaS Shipped Feb 2026 · 21-day build

An autonomous ops agent that handles 71% of inbound.

Operator.io is a B2B SaaS with 4,000+ customers and a 6-person support team drowning in tickets. We built them an AI agent that runs in front of every inbound — and resolves most of it without a human.

The Challenge

Support team was answering 2,400 tickets/week, 60% of which were the same 12 questions. Their average response time was 8 hours and trending up. Hiring more agents was a treadmill — they needed to break the loop.

What we shipped

  • Tier-1 AI support agent (Claude-powered)
  • Real-time RAG over their help center + internal docs
  • Confidence scoring + auto-escalation
  • Slack + Zendesk + Intercom integration
  • Live analytics dashboard for ops leads

Stack

  • Node · TypeScript · Fastify
  • Claude Sonnet 4 · Anthropic API
  • Pinecone · Postgres
  • Zendesk + Intercom webhooks
  • Datadog · Sentry
71%Tickets auto-resolved
−84%Avg. response time
21dFrom kickoff to live
$340kAnnual cost saved

The signal: their best agent was a copy-paste

When we audited Operator's support inbox, the pattern was obvious. Their top agent's "secret" wasn't expertise — it was a folder of 14 saved replies and 9 link templates. Most "support work" was pattern-matching a ticket to a saved reply. That's a job for an AI agent.

"Within 10 days we had something better than our average human. By day 21 it was outperforming our best agent on response time, with the same CSAT." — Marcus T., Head of Ops, Operator.io

Architecture: confidence scoring is the whole game

The hard part of building an AI support agent isn't getting it to answer — it's getting it to know when not to. Every reply runs through a confidence scoring layer that uses three signals:

  1. Embedding similarity to the closest matching question in the knowledge base
  2. Claude self-eval on its own answer ("how confident are you, 1-10?")
  3. Historical resolution rate for the matched intent

If confidence drops below 0.78 on any axis, it escalates to a human with the draft answer pre-loaded. The human just edits and sends — and that edit becomes a new training signal.

Results, week by week

Week 1: 12% of tickets resolved without escalation. Week 2: 38%. Week 4: 64%. By month 3, holding steady at 71% auto-resolution with CSAT actually higher than the pre-launch baseline.

The most surprising win: response time. Even escalated tickets are faster — because the human sees a draft, not a blank box. Average response time dropped from 8 hours to 1.2 seconds for auto-resolved, and from 8 hours to 38 minutes for escalations.

What we'd do differently

We'd ship the analytics dashboard on day one. Operator's ops team got religion about the agent's performance once they could see it in real time — and the dashboard exposed three categories of tickets we were under-serving, which became targeted improvements.

"This quietly replaced a $400k/year team expansion. The agent didn't replace anyone — it gave our existing team their afternoons back."

— Marcus T.Head of Ops, Operator.io
Next case study →

Havéne — a wealth concierge agent for advisors

Read next case