All articles
Build Mar 2026 · 10 min read

Our default 2026 AI stack.

The exact tools, models and infrastructure we reach for on day one of every project. Opinionated. With trade-offs. Updated as the landscape moves.

People ask us this constantly: "what should I use?" The answer is always boring — pick the thing that will let your team ship, not the thing that will look best in a tech blog post. With that caveat, here's our default stack as of Q2 2026.

The model layer

Primary: Claude Sonnet 4

Our default for everything that involves reasoning, long context, or following complex instructions. It's better than GPT-4o at multi-step tasks, follows system prompts more reliably, and is significantly cheaper for high-volume use. We've shipped 28 of our last 30 projects on Sonnet 4.

For deep reasoning: Claude Opus 4

Used for the 5% of calls where you need real depth — financial analysis, legal reasoning, complex code review. We don't use Opus for production user-facing flows because of latency, but we use it inside async pipelines and offline evaluation.

For dirt-cheap volume: Claude Haiku

When you need to classify, route, or do quick extraction over millions of items, Haiku is 1/10th the cost of Sonnet and good enough. Use it for routing prompts to other models, summarising chunks, or doing first-pass labeling.

What we use OpenAI for

Whisper for speech-to-text (still the best), embeddings (text-embedding-3-large), and GPT-4o for tasks that specifically need image understanding inside a chat loop. We don't use GPT-4o as a primary text model anymore.

"Use Claude as your default, OpenAI for specific specialties, Google for specific specialties." That's our entire model strategy in 2026.

The orchestration layer

For agent workflows: TypeScript + Anthropic SDK

We don't use LangChain anymore. The abstractions get in the way for production work and the framework keeps reshaping itself. Plain TypeScript + the Anthropic SDK + good function-calling discipline is more code but significantly more maintainable.

For non-engineering workflows: n8n

When the workflow doesn't need custom code — connect Shopify to a model to Slack — n8n self-hosted is faster to ship and easier for the client's ops team to maintain after handoff. Zapier is fine for very simple flows; n8n is the right call once there are 4+ steps.

For RAG: pgvector + custom

We've migrated away from Pinecone for almost all client work. Postgres with pgvector handles up to ~5M embeddings comfortably, it's already in the client's stack, and it has zero per-query cost. Pinecone is still right for some very specific cases (multi-tenant SaaS embeddings at scale), but it's not the default anymore.

The infrastructure layer

The eval layer

This is where almost no one invests enough. We use:

What we don't use (and why)

This will be wrong in 6 months

Every line of this article will be revisable by Q4 2026. The stack churns. The principles don't:

  1. Pick the boring choice.
  2. Optimise for ability to ship, not theoretical maximums.
  3. Invest in evals before sophistication.
  4. Stay one layer below the bleeding edge — last quarter's frontier is this quarter's stable.

If you'd like our latest version of this list or want our take on a stack you're considering, just ask — happy to share what we've shipped.