All articles
Automation May 2026 · 8 min read

The 30-day AI implementation playbook.

The exact sprint structure we use with every client — day-by-day, week-by-week — written so any in-house team can copy it. No theory. Just the moves that ship.

Most AI projects fail in the same five places. We know because we've fixed them in 40+ engagements. This is the structure we run every time. It's not the only way to ship AI in 30 days, but it's the only one we've found that's reliably repeatable.

Week 0 — the kickoff (1 day)

Before week one starts, there's a single kickoff call. Two hours, four humans: the founder/sponsor, the operational lead, the AI lead, and your project owner. The deliverable from this call is a one-page brief that defines:

If you can't get this onto one page, the scope is too big. Cut.

Week 1 — Audit

Day 1–3: workflow shadowing. You sit with the person currently doing the work. Watch them, ask why, record everything. The goal is not to redesign the workflow — it's to understand it.

Day 4: data audit. What information feeds the workflow? Is it accessible programmatically? Is it clean? Does it live in tools we can integrate with? This is where most projects discover they need a week of data plumbing before AI can do anything useful.

Day 5: write the spec. One document. Inputs, outputs, edge cases, evals, escalation paths. If a stakeholder can't read this and tell you what the system will do, rewrite it until they can.

Week 2 — Design

Day 6–8: architecture + model selection. Pick the model. Pick the framework. Pick the eval harness. Don't get clever — boring choices ship faster.

Day 9–10: build the eval suite first. Yes, before any prompt-engineering. You need 50–100 test cases that represent real production traffic. Without this, you can't tell if your tuning is making things better or worse. We learned this the hard way.

The best teams we work with treat evals like tests — not as a "we'll add them later" but as the actual definition of done.

Week 3 — Build

This is the sprint week. Mornings: prompt + model work. Afternoons: integration. Evenings: nothing — don't burn out in week 3, you need week 4 to be good.

The order matters:

  1. Get the dumbest end-to-end working first. A working pipeline that gives bad answers beats a great prompt that's not wired up.
  2. Run it against the eval suite. Look at the worst 10 failures.
  3. Iterate prompts to fix those 10. Don't get distracted by edge cases until the common path is solid.
  4. Add observability. Every model call logged with input, output, latency, cost.
  5. Ship to staging by Friday.

Week 4 — Launch

Day 22–24: shadow mode. The AI runs alongside the human, generating draft outputs. The human still acts. You're collecting comparison data and trust.

Day 25–27: limited rollout. Turn on AI for 10% of traffic. Monitor. Iterate.

Day 28–30: full rollout + handoff. By day 30 you have a live system, a dashboard, a runbook, and someone internal who can iterate on prompts. If you can't say "Sarah from ops now owns this", you haven't shipped.

What kills these sprints

Three things, in order of frequency:


If this is useful and you want help running it in your team, we book one strategy call a week for in-house teams. Grab a slot here.