Governing AI in production: AI Agents that survive beyond POC and live in a real-world

TL;DR

AI isn’t the problem – everything around AI is. At the Levi9 webinar held on April 2nd, Principal Architect Dario Djurica and Lead Software Engineer Nemanja Pavlovic broke down why so many AI projects don’t survive the journey from POC to production – and what it actually takes to make AI work in the real world. The key takeaway: the models are good enough. The ecosystem around them isn’t.

Why the Demo Works and Production Doesn't

There’s a massive gap between what AI can do and what’s actually running in enterprise systems today. Anthropic measured it: the curve of AI capability and the curve of real-world adoption barely intersect. Companies are investing – $11 billion last year, nearly $40 billion this year – but results lag behind. 40% of all agentic AI projects get cancelled. And 80% of practitioners say security is one of the biggest problems they face when introducing AI into their organizations.

 

The paradox runs deeper: 82% of companies have some form of AI in use, but only 30% actually govern it. The rest simply exists – somewhere between sales and marketing, unmonitored, untraceable, uncontrolled.

A Story from the Field: From 50 Documents to 50,000

One of the projects Dario and Nemanja ran started as a classic POC – an internal knowledge base agent, 50 neatly prepared documents, a Q&A system that worked flawlessly. The client was thrilled. Green light for production.

 

Then they got access to the real SharePoint. Set up in 2011. No folders, no structure, no system. Fifty thousand documents – PDFs, Word files, emails, plain text files, multiple versions of the same document scattered across departments, and a mountain of data someone had typed in and forgotten. Initial estimate: six months just to sort out the data layer. They eventually cut that down to about a month and a half – but only by leaning on the Microsoft AIQ platform, which absorbed much of the work around security controls and multi-format document parsing.

 

The lesson: every POC lives on clean data. Production lives on messy data.

Six Pillars of Production Readiness

Through work across multiple projects, a framework emerged that Levi9 uses to assess how ready an AI solution actually is for production. This isn’t about organizational challenges – that’s a separate conversation – but about the technical side.

 

  1. Data Foundation

Data is the agent’s brain. Without clear structure, access control (who sees what), freshness mechanisms (which version of a document is current), and quality gates that filter what the agent actually serves to the user – everything else falls apart.

 

  1. Agent Design and Guardrails

AI shouldn’t be bolted onto an existing business process as a patch. The entire process needs to be redesigned with AI as a first-class citizen, not an add-on. That design must include output validation, tool call verification, clearly defined scope boundaries, and – critically – human-in-the-loop checkpoints wherever the decision warrants it.

 

  1. Observability and Operations

Logs that say “200 OK” are not enough in the world of agentic systems. You need to trace the full reasoning chain – which prompt, which model, which reasoning level. In one project, a model ran perfectly in production for days, then dramatically slowed down because the same vendor released a new version and the entire internet rushed to test it. Without a real-time dashboard, you wouldn’t know until the client calls.

 

  1. Security

Security is an afterthought on too many projects. A classic service account for an agent isn’t good enough, because you can’t trace what the agent is actually doing across systems. Microsoft and AWS have introduced dedicated agent identities – the same principle as user identities, with full action traceability. Relying solely on the system prompt (“don’t return this document”) works on a demo dataset. It doesn’t hold when users start probing boundaries.

 

  1. Cost Management

Token attribution per agent, per team, per task type – all of it needs to be tracked. An agent can get stuck in an infinite loop and burn resources for hours without anyone noticing. Budget alerts, smart routing (complex tasks go to more capable models, simple ones to lighter ones), and loop detection aren’t nice-to-haves. They’re infrastructure.

 

  1. Governance and Compliance

The EU AI Act is here. Companies waiting to implement it “when they have to” are already behind. Audit trails are mandatory: every prompt, every reasoning pattern, every attempt to access a document must be stored and accessible. Regulated industries – finance, healthcare – already know this. Everyone else will learn.

Build, Buy, or Platform?

There are three approaches companies choose between.

 

Build everything from scratch? Developers love it, but it doesn’t scale. Every team builds something different, tools don’t talk to each other, and technical debt compounds fast.

 

Buy individual tools? This can work for specific problems – Nvidia Guardrails for protection, specialized validation tooling – but integrating them cleanly in an enterprise context rarely goes smoothly.

 

Platform-first? That’s what Levi9 recommends. Platforms from Microsoft, AWS, and Nvidia cover all six pillars described above – not always perfectly, since the platforms themselves are still maturing, but as a comprehensive starting point they’re far ahead of the alternatives. Forrester backs this up: three out of four AI projects built entirely from scratch, without a platform foundation, fail.

What's Next

The next Levi9 webinar goes deep into those platforms: what works, what doesn’t, and how teams already using them are navigating real-world projects. Because finding your way through the Wild West of AI tooling still requires someone who’s already crossed that terrain.

In this article:

Related posts

June 26th

The Cost of Choice

Most companies spend up to 40% to much on cloud, are you? Cut spend, not options. Smart standardizations win.

Cloud cost overruns and growing technical debt rarely stem from tooling alone—they are symptoms of architectural and operational choices. This session looks at how senior technical leaders can regain control by connecting cloud spend directly to business value. We’ll explore unit‑economics thinking, ownership models, and lifecycle management practices that reduce waste while preserving delivery speed. You’ll learn how to combine FinOps principles with technical‑debt controls to create a cloud environment that is financially sustainable and technically healthy.

May 28th

AI AGENTS DESERVE AI PLATFORM

Portable patterns for Azure, AWS and GCP that survive the next upgrade

AI agents are moving rapidly from experimentation into real production use cases, but architectures vary widely across cloud platforms. In this webinar, we compare practical patterns for building and running AI agents on Azure, AWS, and Google Cloud Platform. We’ll focus on what to standardize, where to embrace cloud‑native capabilities, and how to design for security, observability, and future change. The goal is not to pick a winner, but to help leaders understand how to scale agent‑based solutions without locking themselves into fragile designs.

April 23rd

Winning on Repeat: Product Engineering in the Age of AI

Cadence, quality and outcomes over output

Delivering a successful solution once is no longer enough. In the age of AI, organizations need product engineering models that enable them to win consistently across teams, releases, and markets. This session explores how leading organizations evolve from project‑centric delivery to product‑centric execution, supported by AI‑augmented engineering practices. We’ll look at cadence, quality, and accountability, and how leadership decisions shape sustainable delivery performance over time.

April 2nd

GOVERNING AI IN PRODUCTION

Designing cloud and data platforms that survive real-world pressure

Many organizations succeed in building AI proofs of concept, far fewer succeed in scaling them safely into production. This webinar focuses on what it takes to move from experimentation to reliable, governed AI platforms. We’ll discuss platform architecture choices, model governance, security, and policy patterns that enable teams to deploy AI at scale without slowing down delivery. Designed for senior technical leaders, this session provides practical guidance on turning AI initiatives into durable capabilities that deliver value beyond the first demo

March 5th

Navigating Digital Sovereignty and Strategic Cloud Choices

How Organizations Can Balance Innovation, Compliance, and Control in a Multi-Cloud World

In today’s rapidly evolving digital landscape, organisations face increasing pressure to ensure business continuity, maintain public trust, and comply with complex regulations like NIS2, DORA, and GDPR. This webinar explores the critical concepts of digital and operational sovereignty, the strategic importance of hybrid and sovereign cloud models, and the risks of vendor lock-in.