Importance of system thinking over model thinking in RAG systems

From: aidotengineer

D. Kiela, CEO at Contextual AI and a pioneer of RAG at Facebook AI Research, emphasizes that focusing on the entire system rather than just the language model is crucial for successful AI deployment in enterprises, especially for RAG agents in production [00:00:16].

The Enterprise AI Paradox

While there’s an estimated $4.4 trillion added value to the global economy from AI, according to McKinsey, many enterprises experience frustration, with only one in four businesses actually getting value from AI [00:00:54]. This creates a paradox.

Moravec’s Paradox in AI

Drawing a parallel to Moravec’s Paradox in robotics—where beating humans at chess was easier than vacuum cleaning a house—a similar “context paradox” exists in Enterprise AI [00:01:40]. Large Language Models (LLMs) excel at tasks like generating code or solving mathematical problems, which seem complex to humans [00:02:28]. However, they struggle with putting information into the right context, a task humans perform effortlessly using intuition and expertise [00:02:41].

The “context paradox” is key to unlocking ROI with AI [00:03:15]. Current general-purpose assistants offer convenience, but true business transformation and “differentiated value” require much better handling of enterprise-specific context [00:03:34].

Systems, Not Just Models

A core lesson is that while language models are powerful, they often constitute only 20% of a larger system in an Enterprise AI deployment [00:04:40]. Such deployments are typically built as a RAG system (Retrieval Augmented Generation), a method originally pioneered by Kiela and his team at Facebook AI Research [00:04:54].

A “mediocre language model” surrounded by an “amazing RAG pipeline” will outperform an “amazing language model” with a “terrible RAG pipeline” [00:05:23].

Therefore, the focus should be on designing the entire system that solves the problem, not just on the language model itself [00:05:35].

Key Observations for Production RAG Systems

1. Specialization Over AGI

Enterprises should aim to specialize their AI solutions to capture and leverage their unique institutional knowledge and expertise [00:06:11]. While Artificial General Intelligence (AGI) has its uses, solving specific, difficult, domain-specific problems is more effectively achieved through specialization [00:06:19].

2. Data as Your Moat

A company’s enduring asset is its data [00:06:58]. The ability to make AI work effectively on noisy, real-world data at scale is incredibly difficult but creates differentiated value and a competitive “moat” [00:07:26].

3. Design for Production from Day One

Building a pilot RAG system is relatively easy [00:07:56]. However, scaling it to millions of documents, thousands of users, and numerous use cases, while meeting enterprise security and compliance requirements, is far more challenging [00:08:26]. Success hinges on designing for production from the outset, not just for a pilot [00:08:59].

4. Speed Over Perfection

In production rollouts of RAG agents, speed is paramount [00:09:10]. Getting a barely functional solution to real users early for feedback allows for iterative improvement and “Hill Climbing” to a “good enough” level, avoiding delays caused by striving for initial perfection [00:09:34].

5. Empower Engineers

Engineers should focus on delivering business value and differentiated solutions, rather than spending time on tedious tasks like optimizing chunking strategies or basic prompt engineering [00:10:36]. These foundational tasks can be abstracted away by state-of-the-art platforms for RAG agents [00:10:55].

6. Make AI Easy to Consume

A common issue is that GenAI running in production is barely used [00:11:15]. Success requires making solutions easy to consume and integrating them closely into existing enterprise workflows, ensuring actual adoption and usage [00:11:42].

7. Engineer for “Wow” Moments

Getting users to quickly experience a “spark” or “wow” moment with the AI system is crucial for sticky usage and evangelism [00:12:21]. An example from Qualcomm illustrated a customer engineer finding a seven-year-old hidden document that answered long-standing questions, profoundly impacting their work [00:12:45].

8. Focus on Inaccuracy (Observability)

While accuracy is a minimum requirement, dealing with the inevitable 5-10% inaccuracy is critical [00:13:41]. This involves robust observability, proper audit trails (especially in regulated industries), and attribution in RAG systems to show why an answer was generated [00:13:54]. Post-processing to check claims and ensure proper attributions is key [00:14:23].

9. Be Ambitious

Projects often fail not from aiming too high, but too low [00:14:54]. Aiming for basic Q&A on simple topics will not yield significant ROI [00:15:02]. Enterprises should be ambitious and target problems that, if solved, will deliver substantial business value and truly transform the organization [00:15:12].

By understanding the context paradox and applying these lessons—building better systems, specializing for expertise, and being ambitious—enterprises can turn challenges into opportunities in AI [00:16:04].

Tubegraph

Explorer

Table of Contents