The signal: their best agent was a copy-paste
When we audited Operator's support inbox, the pattern was obvious. Their top agent's "secret" wasn't expertise — it was a folder of 14 saved replies and 9 link templates. Most "support work" was pattern-matching a ticket to a saved reply. That's a job for an AI agent.
"Within 10 days we had something better than our average human. By day 21 it was outperforming our best agent on response time, with the same CSAT." — Marcus T., Head of Ops, Operator.io
Architecture: confidence scoring is the whole game
The hard part of building an AI support agent isn't getting it to answer — it's getting it to know when not to. Every reply runs through a confidence scoring layer that uses three signals:
- Embedding similarity to the closest matching question in the knowledge base
- Claude self-eval on its own answer ("how confident are you, 1-10?")
- Historical resolution rate for the matched intent
If confidence drops below 0.78 on any axis, it escalates to a human with the draft answer pre-loaded. The human just edits and sends — and that edit becomes a new training signal.
Results, week by week
Week 1: 12% of tickets resolved without escalation. Week 2: 38%. Week 4: 64%. By month 3, holding steady at 71% auto-resolution with CSAT actually higher than the pre-launch baseline.
The most surprising win: response time. Even escalated tickets are faster — because the human sees a draft, not a blank box. Average response time dropped from 8 hours to 1.2 seconds for auto-resolved, and from 8 hours to 38 minutes for escalations.
What we'd do differently
We'd ship the analytics dashboard on day one. Operator's ops team got religion about the agent's performance once they could see it in real time — and the dashboard exposed three categories of tickets we were under-serving, which became targeted improvements.