Azure vs. AWS vs. Local GPU: Which RAG Architecture Fits Your Business?

In my 10 years working in the tech industry, I have seen waves of trends, but few have moved as fast as Generative AI. The numbers back this up: McKinsey reports that by 2025, 67% of telecom respondents expect to use GenAI for more than 30% of their daily tasks. 

 

But statistics are one thing; production is another. 

 

Today, my colleagues and I are seeing a massive rise in RAG (Retrieval-Augmented Generation) solutions. We are moving beyond simple chatbots to building complex systems that “talk” to your internal documentation. Based on the projects I’ve delivered for our customers, from retail to finance, here is a breakdown of the architectures we actually use, the problems we’ve solved, and my honest advice on when not to use GenAI. 

Why We Are Betting on RAG

When I talk to clients, I explain that a standard chatbot is like a brilliant improviser who sometimes makes things up. RAG changes that. It forces the bot to look at your specific data before answering. 

 

We have built RAG solutions for everything from internal “Q&A over documentation” tools to customer-facing retail bots where users ask about their specific order status. In my experience, this approach is the only way to guarantee accuracy, domain adaptability, and data privacy. 

The Three Architectures We Use

Over the last year, I’ve overseen deployments across three different environments. Here is how they compare in the real world: 

1. The On-Premise / GPU Approach (Llama & Hugging Face)

We built a chatbot in Levi9 using purely internal infrastructure. This was for a customer who needed absolute control. 

 

Follow these principles: 

2. The Azure Cloud Approach

For one of our retail customers, we deployed a solution using the Microsoft stack. 

3. The AWS Approach (Bedrock & Guardrails)

I have also worked with Amazon Bedrock, and there is one feature here that I find particularly useful for corporate clients:Guardrails. 

When I Don't Use GenAI

This might sound counter-intuitive coming from an AI proponent, but sometimes GenAI is the wrong tool. 

 

I am currently working on a project involving sensitive financial data. In this specific case, we made the hard decision to not use Generative AI. Why? Because we cannot risk hallucinations. When dealing with bank details and financial reporting, “mostly accurate” isn’t good enough. 

 

In these scenarios, I always pivot back to classical AI and basic statistics. We need models that are mathematically interpretable and verifiable. If you are in a high-risk industry, my advice is to prioritize interpretability over the “cool factor” of GenAI. 

Challenges & My Recommendations

If you are ready to build, here are a few hurdles I’ve faced and how I suggest you handle them: 

The Playground is Open: It’s Time to Experiment

If there is one thing my decade in tech has taught me, it’s that you cannot learn this technology from slides alone. We are in a unique moment where the barrier to entry is incredibly low. Whether you are an individual developer or a CTO, you have access to the same powerful tools we use—Amazon Bedrock, Azure OpenAI, and the endless library of models on Hugging Face. 

 

My advice? Don’t wait for the “perfect” use case. 

 

Go explore. Spin up a sandbox environment. Pull a model from Hugging Face and try to break it. See what happens when you feed it your own data. Test the limits of what these architectures can do. The landscape is shifting under our feet, and the only way to stay relevant is to be the one testing the ground. 

 

We are just scratching the surface of what RAG can do. I’ve shared my map of the territory—now it’s time for you to start walking the path. 

In this article:
Published:
17 March 2026

Related posts

June 26th

The Cost of Choice

Most companies spend up to 40% to much on cloud, are you? Cut spend, not options. Smart standardizations win.

Cloud cost overruns and growing technical debt rarely stem from tooling alone—they are symptoms of architectural and operational choices. This session looks at how senior technical leaders can regain control by connecting cloud spend directly to business value. We’ll explore unit‑economics thinking, ownership models, and lifecycle management practices that reduce waste while preserving delivery speed. You’ll learn how to combine FinOps principles with technical‑debt controls to create a cloud environment that is financially sustainable and technically healthy.

May 28th

AI AGENTS DESERVE AI PLATFORM

Portable patterns for Azure, AWS and GCP that survive the next upgrade

AI agents are moving rapidly from experimentation into real production use cases, but architectures vary widely across cloud platforms. In this webinar, we compare practical patterns for building and running AI agents on Azure, AWS, and Google Cloud Platform. We’ll focus on what to standardize, where to embrace cloud‑native capabilities, and how to design for security, observability, and future change. The goal is not to pick a winner, but to help leaders understand how to scale agent‑based solutions without locking themselves into fragile designs.

April 23rd

Winning on Repeat: Product Engineering in the Age of AI

Cadence, quality and outcomes over output

Delivering a successful solution once is no longer enough. In the age of AI, organizations need product engineering models that enable them to win consistently across teams, releases, and markets. This session explores how leading organizations evolve from project‑centric delivery to product‑centric execution, supported by AI‑augmented engineering practices. We’ll look at cadence, quality, and accountability, and how leadership decisions shape sustainable delivery performance over time.

April 2nd

GOVERNING AI IN PRODUCTION

Designing cloud and data platforms that survive real-world pressure

Many organizations succeed in building AI proofs of concept, far fewer succeed in scaling them safely into production. This webinar focuses on what it takes to move from experimentation to reliable, governed AI platforms. We’ll discuss platform architecture choices, model governance, security, and policy patterns that enable teams to deploy AI at scale without slowing down delivery. Designed for senior technical leaders, this session provides practical guidance on turning AI initiatives into durable capabilities that deliver value beyond the first demo

March 5th

Navigating Digital Sovereignty and Strategic Cloud Choices

How Organizations Can Balance Innovation, Compliance, and Control in a Multi-Cloud World

In today’s rapidly evolving digital landscape, organisations face increasing pressure to ensure business continuity, maintain public trust, and comply with complex regulations like NIS2, DORA, and GDPR. This webinar explores the critical concepts of digital and operational sovereignty, the strategic importance of hybrid and sovereign cloud models, and the risks of vendor lock-in.