TL;DR
AI isn’t the problem – everything around AI is. At the Levi9 webinar held on April 2nd, Principal Architect Dario Djurica and Lead Software Engineer Nemanja Pavlovic broke down why so many AI projects don’t survive the journey from POC to production – and what it actually takes to make AI work in the real world. The key takeaway: the models are good enough. The ecosystem around them isn’t.
Why the Demo Works and Production Doesn't
There’s a massive gap between what AI can do and what’s actually running in enterprise systems today. Anthropic measured it: the curve of AI capability and the curve of real-world adoption barely intersect. Companies are investing – $11 billion last year, nearly $40 billion this year – but results lag behind. 40% of all agentic AI projects get cancelled. And 80% of practitioners say security is one of the biggest problems they face when introducing AI into their organizations.
The paradox runs deeper: 82% of companies have some form of AI in use, but only 30% actually govern it. The rest simply exists – somewhere between sales and marketing, unmonitored, untraceable, uncontrolled.
A Story from the Field: From 50 Documents to 50,000
One of the projects Dario and Nemanja ran started as a classic POC – an internal knowledge base agent, 50 neatly prepared documents, a Q&A system that worked flawlessly. The client was thrilled. Green light for production.
Then they got access to the real SharePoint. Set up in 2011. No folders, no structure, no system. Fifty thousand documents – PDFs, Word files, emails, plain text files, multiple versions of the same document scattered across departments, and a mountain of data someone had typed in and forgotten. Initial estimate: six months just to sort out the data layer. They eventually cut that down to about a month and a half – but only by leaning on the Microsoft AIQ platform, which absorbed much of the work around security controls and multi-format document parsing.
The lesson: every POC lives on clean data. Production lives on messy data.

Six Pillars of Production Readiness
Through work across multiple projects, a framework emerged that Levi9 uses to assess how ready an AI solution actually is for production. This isn’t about organizational challenges – that’s a separate conversation – but about the technical side.
- Data Foundation
Data is the agent’s brain. Without clear structure, access control (who sees what), freshness mechanisms (which version of a document is current), and quality gates that filter what the agent actually serves to the user – everything else falls apart.
- Agent Design and Guardrails
AI shouldn’t be bolted onto an existing business process as a patch. The entire process needs to be redesigned with AI as a first-class citizen, not an add-on. That design must include output validation, tool call verification, clearly defined scope boundaries, and – critically – human-in-the-loop checkpoints wherever the decision warrants it.
- Observability and Operations
Logs that say “200 OK” are not enough in the world of agentic systems. You need to trace the full reasoning chain – which prompt, which model, which reasoning level. In one project, a model ran perfectly in production for days, then dramatically slowed down because the same vendor released a new version and the entire internet rushed to test it. Without a real-time dashboard, you wouldn’t know until the client calls.
- Security
Security is an afterthought on too many projects. A classic service account for an agent isn’t good enough, because you can’t trace what the agent is actually doing across systems. Microsoft and AWS have introduced dedicated agent identities – the same principle as user identities, with full action traceability. Relying solely on the system prompt (“don’t return this document”) works on a demo dataset. It doesn’t hold when users start probing boundaries.
- Cost Management
Token attribution per agent, per team, per task type – all of it needs to be tracked. An agent can get stuck in an infinite loop and burn resources for hours without anyone noticing. Budget alerts, smart routing (complex tasks go to more capable models, simple ones to lighter ones), and loop detection aren’t nice-to-haves. They’re infrastructure.
- Governance and Compliance
The EU AI Act is here. Companies waiting to implement it “when they have to” are already behind. Audit trails are mandatory: every prompt, every reasoning pattern, every attempt to access a document must be stored and accessible. Regulated industries – finance, healthcare – already know this. Everyone else will learn.
Build, Buy, or Platform?
There are three approaches companies choose between.
Build everything from scratch? Developers love it, but it doesn’t scale. Every team builds something different, tools don’t talk to each other, and technical debt compounds fast.
Buy individual tools? This can work for specific problems – Nvidia Guardrails for protection, specialized validation tooling – but integrating them cleanly in an enterprise context rarely goes smoothly.
Platform-first? That’s what Levi9 recommends. Platforms from Microsoft, AWS, and Nvidia cover all six pillars described above – not always perfectly, since the platforms themselves are still maturing, but as a comprehensive starting point they’re far ahead of the alternatives. Forrester backs this up: three out of four AI projects built entirely from scratch, without a platform foundation, fail.
What's Next
The next Levi9 webinar goes deep into those platforms: what works, what doesn’t, and how teams already using them are navigating real-world projects. Because finding your way through the Wild West of AI tooling still requires someone who’s already crossed that terrain.












