Case Study — Operator.io · Autonomous ops agent

The signal: their best agent was a copy-paste

When we audited Operator's support inbox, the pattern was obvious. Their top agent's "secret" wasn't expertise — it was a folder of 14 saved replies and 9 link templates. Most "support work" was pattern-matching a ticket to a saved reply. That's a job for an AI agent.

"Within 10 days we had something better than our average human. By day 21 it was outperforming our best agent on response time, with the same CSAT." — Marcus T., Head of Ops, Operator.io

Architecture: confidence scoring is the whole game

The hard part of building an AI support agent isn't getting it to answer — it's getting it to know when not to. Every reply runs through a confidence scoring layer that uses three signals:

Embedding similarity to the closest matching question in the knowledge base
Claude self-eval on its own answer ("how confident are you, 1-10?")
Historical resolution rate for the matched intent

If confidence drops below 0.78 on any axis, it escalates to a human with the draft answer pre-loaded. The human just edits and sends — and that edit becomes a new training signal.

Results, week by week

Week 1: 12% of tickets resolved without escalation. Week 2: 38%. Week 4: 64%. By month 3, holding steady at 71% auto-resolution with CSAT actually higher than the pre-launch baseline.

The most surprising win: response time. Even escalated tickets are faster — because the human sees a draft, not a blank box. Average response time dropped from 8 hours to 1.2 seconds for auto-resolved, and from 8 hours to 38 minutes for escalations.

What we'd do differently

We'd ship the analytics dashboard on day one. Operator's ops team got religion about the agent's performance once they could see it in real time — and the dashboard exposed three categories of tickets we were under-serving, which became targeted improvements.

An autonomous ops agent that handles 71% of inbound.

The Challenge

What we shipped

Stack

The signal: their best agent was a copy-paste

Architecture: confidence scoring is the whole game

Results, week by week

What we'd do differently

Havéne — a wealth concierge agent for advisors