In my 10 years working in the tech industry, I have seen waves of trends, but few have moved as fast as Generative AI. The numbers back this up: McKinsey reports that by 2025, 67% of telecom respondents expect to use GenAI for more than 30% of their daily tasks.

But statistics are one thing; production is another.

Today, my colleagues and I are seeing a massive rise in RAG (Retrieval-Augmented Generation) solutions. We are moving beyond simple chatbots to building complex systems that “talk” to your internal documentation. Based on the projects I’ve delivered for our customers, from retail to finance, here is a breakdown of the architectures we actually use, the problems we’ve solved, and my honest advice on when not to use GenAI.

Why We Are Betting on RAG

When I talk to clients, I explain that a standard chatbot is like a brilliant improviser who sometimes makes things up. RAG changes that. It forces the bot to look at your specific data before answering.

We have built RAG solutions for everything from internal “Q&A over documentation” tools to customer-facing retail bots where users ask about their specific order status. In my experience, this approach is the only way to guarantee accuracy, domain adaptability, and data privacy.

The Three Architectures We Use

Over the last year, I’ve overseen deployments across three different environments. Here is how they compare in the real world:

1. The On-Premise / GPU Approach (Llama & Hugging Face)

We built a chatbot in Levi9 using purely internal infrastructure. This was for a customer who needed absolute control.

Follow these principles:

2. The Azure Cloud Approach

For one of our retail customers, we deployed a solution using the Microsoft stack.

3. The AWS Approach (Bedrock & Guardrails)

I have also worked with Amazon Bedrock, and there is one feature here that I find particularly useful for corporate clients: Guardrails.

When I Don't Use GenAI

This might sound counter-intuitive coming from an AI proponent, but sometimes GenAI is the wrong tool.

I am currently working on a project involving sensitive financial data. In this specific case, we made the hard decision to not use Generative AI. Why? Because we cannot risk hallucinations. When dealing with bank details and financial reporting, “mostly accurate” isn’t good enough.

In these scenarios, I always pivot back to classical AI and basic statistics. We need models that are mathematically interpretable and verifiable. If you are in a high-risk industry, my advice is to prioritize interpretability over the “cool factor” of GenAI.

Challenges & My Recommendations

If you are ready to build, here are a few hurdles I’ve faced and how I suggest you handle them:

Security is Paramount: For one of our projects, we had to strictly anonymize Named Entities (names, locations, bank details) before the data ever touched the model. I also insist on ensuring data residency—if you are in Europe, ensure your data doesn't leave the EU to stay GDPR compliant.
Scalability via Quantization: If you are running your own models, you will run into compute limits. We use quantization—essentially reducing the decimal precision of the model weights—to make massive models run efficiently on smaller GPUs.
Prompt Caching: Don't waste money re-processing the same context. We use prompt caching so that when a user asks a follow-up question, the context is retained.

The Playground is Open: It’s Time to Experiment

If there is one thing my decade in tech has taught me, it’s that you cannot learn this technology from slides alone. We are in a unique moment where the barrier to entry is incredibly low. Whether you are an individual developer or a CTO, you have access to the same powerful tools we use—Amazon Bedrock, Azure OpenAI, and the endless library of models on Hugging Face.

My advice? Don’t wait for the “perfect” use case.

Go explore. Spin up a sandbox environment. Pull a model from Hugging Face and try to break it. See what happens when you feed it your own data. Test the limits of what these architectures can do. The landscape is shifting under our feet, and the only way to stay relevant is to be the one testing the ground.

We are just scratching the surface of what RAG can do. I’ve shared my map of the territory—now it’s time for you to start walking the path.

Written by:

Ana-Maria Cehan,
Data Tech Lead
Levi9 Romania

Published:

17 March 2026

Blog post

Implementing the Recommendation System

December 19, 2023

Blog post

Rapid Cloud Development — Application Composer

September 11, 2024

Blog post

Bringing the Tachograph’s Simplicity Back to Drivers’ Fingers with a Transportation App

September 1, 2023

Blog post

A Tab-by-tab Guide to DevTools

March 16, 2026

Blog post

Money, Aspirations, and Motivation: How IT Recruitment Has Changed—Levi9 Ukraine’s Experience

July 31, 2022

Blog post

How to Lose Trust and Alienate Clients

February 12, 2024

Blog post

From Linux Roots to DevOps Leadership: Kristijan’s 9-Year Journey at Levi9

May 7, 2025

Blog post

From Civil Servant to Frontend Developer: Levininer’s Journey into IT

July 1, 2022

Blog post

Mastering Project Management: Secrets to Staying Focused Amid a Marathon of Meetings

October 13, 2022

Why We Are Betting on RAG

The Three Architectures We Use

1. The On-Premise / GPU Approach (Llama & Hugging Face)

2. The Azure Cloud Approach

3. The AWS Approach (Bedrock & Guardrails)

When I Don't Use GenAI

Challenges & My Recommendations

The Playground is Open: It’s Time to Experiment

In this article:

Related posts

Implementing the Recommendation System

Bringing the Tachograph’s Simplicity Back to Drivers’ Fingers with a Transportation App

A Tab-by-tab Guide to DevTools

Money, Aspirations, and Motivation: How IT Recruitment Has Changed—Levi9 Ukraine’s Experience

How to Lose Trust and Alienate Clients

From Linux Roots to DevOps Leadership: Kristijan’s 9-Year Journey at Levi9

From Civil Servant to Frontend Developer: Levininer’s Journey into IT

Mastering Project Management: Secrets to Staying Focused Amid a Marathon of Meetings

Contact us

Explore

Industries

Azure vs. AWS vs. Local GPU: Which RAG Architecture Fits Your Business?

Why We Are Betting on RAG

The Three Architectures We Use

1. The On-Premise / GPU Approach (Llama & Hugging Face)

2. The Azure Cloud Approach

3. The AWS Approach (Bedrock & Guardrails)

When I Don't Use GenAI

Challenges & My Recommendations

The Playground is Open: It’s Time to Experiment

In this article:

Related posts

Implementing the Recommendation System

Rapid Cloud Development — Application Composer

Bringing the Tachograph’s Simplicity Back to Drivers’ Fingers with a Transportation App

A Tab-by-tab Guide to DevTools

Money, Aspirations, and Motivation: How IT Recruitment Has Changed—Levi9 Ukraine’s Experience

How to Lose Trust and Alienate Clients

From Linux Roots to DevOps Leadership: Kristijan’s 9-Year Journey at Levi9

From Civil Servant to Frontend Developer: Levininer’s Journey into IT

Mastering Project Management: Secrets to Staying Focused Amid a Marathon of Meetings

June 26th

The Cost of Choice

Most companies spend up to 40% to much on cloud, are you? Cut spend, not options. Smart standardizations win.

May 28th

AI AGENTS DESERVE AI PLATFORM

Portable patterns for Azure, AWS and GCP that survive the next upgrade

April 23rd

Winning on Repeat: Product Engineering in the Age of AI

Cadence, quality and outcomes over output

April 2nd

GOVERNING AI IN PRODUCTION

Designing cloud and data platforms that survive real-world pressure

March 5th

Navigating Digital Sovereignty and Strategic Cloud Choices

How Organizations Can Balance Innovation, Compliance, and Control in a Multi-Cloud World

When I Don't Use GenAI