AI Implementation: From Strategy to Production in 5 Steps

An opinionated five-step roadmap for moving AI from strategy slide to production system, without the transformation theatre. Discovery, shortlist, build, production, iterate.

11 Mar 2026·11 min read·Productized Team

Most AI strategies in 2026 fail at the same place: the gap between an AI strategy slide and an AI system in production. The slide is easy. The system is the work. This article is a practical, opinionated five-step roadmap for crossing that gap — written for technical and operational leaders who don't want to fund another AI workshop and end up with the same backlog.

We write this as a software vendor that takes mid-market clients from strategy to a live AI system. We've shipped projects that delivered. We've also seen plenty fail. The five steps below are the difference between the two — and they're roughly the same regardless of company size.

The five steps at a glance

Step	What it is	Typical duration	Output
1. Discovery	Scope the actual problem	1–3 weeks	One painful workflow, mapped
2. Shortlist	Pick 1–2 use cases, kill the rest	1 week	One use case ready to build
3. Build	Deterministic scaffolding + AI where it earns its place	4–10 weeks	A working system
4. Production	Monitoring, evaluation, human-in-the-loop	Ongoing from week one	A trustworthy system
5. Iterate	Measure, fix, only then expand	Continuous	A system that gets better over time

Step 1 — Discovery: scope the actual problem

AI projects fail at step zero more often than at any later step. The strategy slide says "automate customer support with AI" — the actual problem is that 30% of inbound emails are duplicates of FAQ items, and the team can't find existing answers fast enough. Those are different problems with different solutions, and the second one is the one worth solving.

What discovery looks like in practice:

Interview 3–7 stakeholders. Operations, support, the front-line team, IT. Not just the executive sponsor.
Map the actual workflow on one page. Where does work come from, who touches it, what tools, where does it stall?
Identify the painful 80%. The repetitive, high-volume part — not the rare, interesting edge case the team likes to talk about.
Ignore the AI hype. Whether the answer is AI or a Zapier flow or a config change in the existing CRM is a step-2 question. Step 1 is just understanding.

What kills AI projects: starting from "we want to do AI" instead of "this specific workflow is costing us X hours / Y errors per month". The first framing makes the project a solution looking for a problem; the second is a problem looking for the right solution.

Step 2 — Shortlist: pick 1–2 to ship, kill the rest

Discovery usually surfaces 5–15 candidate use cases. The temptation is to do them all. Don't. Pick one. Maybe two if they share infrastructure. Reject the rest — for now.

The 2x2 we use:

Impact: how much time/money/error reduction does this deliver if it works? Measured in concrete units, not "big".
Feasibility: is the data available, is the workflow stable, is the success criterion measurable, are stakeholders aligned?

Then put each candidate in one of four boxes:

High impact, high feasibility: ship this. This is your first project.
High impact, low feasibility: park. Come back when feasibility improves (data, alignment, scope).
Low impact, high feasibility: cool but irrelevant. Reject — it's the seductive trap of AI projects that look easy.
Low impact, low feasibility: obvious reject.

Commit to one painful, well-bounded job. "We will reduce time-to-classify on inbound permit documents from 2 days to 1 hour" is well-bounded. "We will use AI to improve operations" is not.

Step 3 — Build: deterministic scaffolding first, AI where it earns its place

The single biggest engineering mistake we see: putting an LLM at the centre of a workflow that mostly doesn't need one. Most of an automation pipeline should be ordinary code — fetch a document, validate it, store it, route it, send a notification. The LLM is one node in the pipeline, not the pipeline itself.

Our default build pattern:

Build the scaffolding deterministically. Workflow engine (n8n is our default), source connectors, downstream actions, audit logging — all of this is normal software. No AI.
Plug an LLM in where it adds unique value. Classifying free-text input. Extracting fields from messy documents. Drafting a response that a human will review. Choosing between branches that aren't expressible as rules.
Build the evaluation set on day one. Real input + expected output for 50–200 examples. Run the LLM step against this set every time you change the prompt or model.
Surface uncertainty. The LLM should be able to say "I'm not sure" and route to a human. "Always answer" is a bug, not a feature.
Log every step. Inputs, outputs, the prompt, the model, the version. You will need this when something goes wrong, and something will go wrong.

Build time for a sensibly-scoped first AI project: 4–10 weeks. Less is usually a thin demo; more is usually scope creep.

Step 4 — Production: monitoring, evaluation, human-in-the-loop where needed

An AI system in production is not the same artefact as the prototype. The prototype works on the test data. Production has to work on tomorrow's input, and the day after that, when someone changes a process upstream and nobody told you.

The four production essentials:

Golden test set

A curated set of 50–200 representative real inputs with expected outputs. Every model change, prompt change, dependency change runs against this set first. If quality regresses, the change doesn't ship. This is the single highest-leverage discipline in AI work.

Drift monitoring

Track the distribution of inputs (lengths, languages, source types) and outputs (confidence scores, refusal rates, downstream actions). When the distribution shifts, something upstream changed — investigate before users notice.

Escalation paths

When the system isn't confident, when retrieval misses, when an exception fires — there's a human who gets it, with a queue and an SLA. "It just routes to Slack" is not an escalation path; it's a guarantee that nobody will respond.

EU AI Act risk classification — done up front

From 2 August 2026 the EU AI Act applies in full. Classify your system at design time, not after launch. For most mid-market AI systems the classification is limited-risk: requires transparency and a system register, not a full conformity assessment. High-risk systems (HR, finance, safety-critical) need significantly more documentation, monitoring and human oversight — bake that in from day one rather than retrofitting.

Eval discipline beats model choice every time. A team with a 200-example golden set and weekly drift monitoring will outperform a team with a fancier model and no measurement, every quarter, forever. Build the eval before you tune the prompt.

Step 5 — Iterate: measurement → improvement → expansion

The temptation after launch is to start the next project. Resist for 8–12 weeks. The first version is wrong about something — usually several things — and the lessons from fixing it are worth more than the next greenfield build.

What iteration looks like in practice:

Measure what you ship. Not just "is the system up". Concrete unit-economics: time saved per case, error rate, escalation rate, user satisfaction. The metric you defined in step 2.
Look at failure modes, not averages. Group what's going wrong. Five recurring failure patterns are worth more than one summary score.
Fix the patterns. Better prompts, better retrieval, better escalation rules, better source data. Sometimes the fix isn't in the AI at all — it's in the process upstream.
Only then expand. Once the first system is delivering on its metric and you understand the failure modes, the next use case is the right time. Not before.

The pattern we see at clients that do AI well: one painful problem solved every quarter. Boring, compounding, real. The pattern we see at clients that fail: ten ambitious projects launched in six months, all stuck in pilot a year later.

What this all costs and how long it takes

A realistic shape for a single end-to-end AI implementation, mid-market scale:

Discovery: 1–3 weeks, €5K–€15K.
Shortlist: 1 week, included in discovery.
Build: 4–10 weeks, €30K–€120K depending on integrations.
Production hardening (parallel with build): included.
First iteration cycle: 8–12 weeks after launch, modest internal time.

Total: a useful AI system in production within 8–14 weeks of starting, at €40K–€140K for a typical first project. Anything dramatically faster is usually a thin demo; anything dramatically more expensive is usually scope creep.

How we work

We help mid-market companies move from AI strategy to production AI systems. We almost always start with a 1–3 week discovery, ship a working first version in 8–14 weeks, and stay involved for the iteration cycles that follow. More about our approach is on our service pages for AI and AI agents.

Have an AI strategy slide that's been gathering dust? Describe what you'd want to ship in a few sentences via our contact form — we'll respond within one working day with an honest read on where to start, and roughly what it would cost.

Relevant pages

serviceAI implementation →From AI strategy to production deployment with monitoring and evaluation.serviceAI agents →Narrow, measured agents that do one valuable job reliably.