The 30-day AI implementation playbook

Most AI projects fail in the same five places. We know because we've fixed them in 40+ engagements. This is the structure we run every time. It's not the only way to ship AI in 30 days, but it's the only one we've found that's reliably repeatable.

Week 0 — the kickoff (1 day)

Before week one starts, there's a single kickoff call. Two hours, four humans: the founder/sponsor, the operational lead, the AI lead, and your project owner. The deliverable from this call is a one-page brief that defines:

The single workflow we're automating (no "we'll also do X")
The success metric, with current baseline and 30-day target
Three non-goals (things we'll explicitly not ship in this sprint)
Who decides what "done" means on day 30

If you can't get this onto one page, the scope is too big. Cut.

Week 1 — Audit

Day 1–3: workflow shadowing. You sit with the person currently doing the work. Watch them, ask why, record everything. The goal is not to redesign the workflow — it's to understand it.

Day 4: data audit. What information feeds the workflow? Is it accessible programmatically? Is it clean? Does it live in tools we can integrate with? This is where most projects discover they need a week of data plumbing before AI can do anything useful.

Day 5: write the spec. One document. Inputs, outputs, edge cases, evals, escalation paths. If a stakeholder can't read this and tell you what the system will do, rewrite it until they can.

Week 2 — Design

Day 6–8: architecture + model selection. Pick the model. Pick the framework. Pick the eval harness. Don't get clever — boring choices ship faster.

Day 9–10: build the eval suite first. Yes, before any prompt-engineering. You need 50–100 test cases that represent real production traffic. Without this, you can't tell if your tuning is making things better or worse. We learned this the hard way.

The best teams we work with treat evals like tests — not as a "we'll add them later" but as the actual definition of done.

Week 3 — Build

This is the sprint week. Mornings: prompt + model work. Afternoons: integration. Evenings: nothing — don't burn out in week 3, you need week 4 to be good.

The order matters:

Get the dumbest end-to-end working first. A working pipeline that gives bad answers beats a great prompt that's not wired up.
Run it against the eval suite. Look at the worst 10 failures.
Iterate prompts to fix those 10. Don't get distracted by edge cases until the common path is solid.
Add observability. Every model call logged with input, output, latency, cost.
Ship to staging by Friday.

Week 4 — Launch

Day 22–24: shadow mode. The AI runs alongside the human, generating draft outputs. The human still acts. You're collecting comparison data and trust.

Day 25–27: limited rollout. Turn on AI for 10% of traffic. Monitor. Iterate.

Day 28–30: full rollout + handoff. By day 30 you have a live system, a dashboard, a runbook, and someone internal who can iterate on prompts. If you can't say "Sarah from ops now owns this", you haven't shipped.

What kills these sprints

Three things, in order of frequency:

Scope creep — usually disguised as "while we're at it, can we also...". Don't.
No baseline metric — you can't celebrate a 4× improvement if no one wrote down where you started.
Skipping the eval suite — you'll skip it, and then spend week 4 firefighting failures you could have caught in week 2.

If this is useful and you want help running it in your team, we book one strategy call a week for in-house teams. Grab a slot here.

The 30-day AI implementation playbook.

Week 0 — the kickoff (1 day)

Week 1 — Audit

Week 2 — Design

Week 3 — Build

Week 4 — Launch

What kills these sprints

More from the journal

Why most AI projects die in PowerPoint

Our default 2026 AI stack (opinionated)