From: aidotengineer
AI agents are gaining significant traction and are being integrated into the workforce as new digital employees, appearing in various forms such as customer service, software security, and research agents [00:48:00]. At their core, agents are systems capable of perceiving, reasoning, and acting on underlying tasks [01:12:00]. This means they can process data, devise a plan based on a user query, and utilize tools, functions, or external systems to complete the task [01:20:00]. Crucially, effective AI agents must also learn from user feedback, adapt to user preferences, and continuously refine themselves for greater accuracy and usefulness [01:38:00].
Challenges in Building and Scaling AI Agents
Building and scaling AI agents presents several difficulties [01:56:00]:
- Rapid Data Change Enterprise data, including business intelligence, flows into systems constantly, leading to rapid data evolution [02:05:00].
- Evolving Preferences User preferences and customer needs are not static and change over time [02:16:00].
- High Inference Costs Deploying large language models (LLMs) to support use cases can lead to high inference costs, where increased usage directly correlates with increased expense [02:28:00].
These challenges highlight the need for a mechanism to keep agents relevant and cost-effective, which is where data flywheels become essential [02:35:00].
What are Data Flywheels?
A data flywheel is a continuous loop or cycle that encompasses data processing, curation, model customization, evaluation, and guardrailing for safer interactions [02:44:00]. It integrates state-of-the-art Retrieval-Augmented Generation (RAG) pipelines with enterprise data to deliver relevant and accurate responses [02:56:00].
As AI agents operate in production environments, the data flywheel cycle continuously curates ground truth data using inference data, business intelligence, and user feedback [03:07:00]. This process facilitates continuous experimentation and evaluation of both existing and newer models. The goal is to identify and surface efficient, smaller models that can provide comparable accuracy to larger LLMs but offer lower latency, faster inference, and a reduced total cost of ownership [03:20:00].
NVIDIA’s Tools for Data Flywheels
NVIDIA offers Nemo microservices, an end-to-end platform designed to build powerful agentic and generative AI systems, as well as robust data flywheels around them [03:55:00]. These services are exposed as simple API endpoints, making them easy to use [04:59:00]. They can be run on-prem, in the cloud, in data centers, or even at the edge, with enterprise-grade stability and support [05:14:00].
Key components of Nemo microservices include:
- Nemo Curator Helps in curating high-quality training datasets, including multimodal data [04:13:00].
- Nemo Customizer Facilitates fine-tuning and customizing underlying models using state-of-the-art techniques such as LoRa, P-tuning, and full SFT [04:21:00].
- Nemo Evaluator Used to benchmark models against academic and institutional standards, and can also leverage LLMs as a judge [04:34:00].
- Nemo Guardrails Provides guardrail interactions to ensure privacy, security, and safety [04:47:00].
- Nemo Retriever Used to build state-of-the-art RAG pipelines [04:51:00].
Sample Data Flywheel Architecture
A data flywheel architecture can be constructed by combining these Nemo microservices like Lego pieces [05:32:00]. For example, in a customer service agent scenario:
- An end-user interacts with the agent’s front end [05:43:00].
- The interaction is guardrailed for safety [05:50:00].
- A model, served as an NVIDIA NIM (NVIDIA Inference Microservice), powers the agent for optimized inference [05:55:00].
- A data flywheel loop is set up to constantly curate data, store it in a Nemo data store, and use Nemo Customizer and Evaluator [06:09:00].
- This triggers a continuous cycle of retraining and evaluation [06:18:00].
- Once a model meets target accuracy, an IT admin or AI engineer can promote it to power the agent’s use case [06:23:00].
Case Study: NVIDIA’s Internal NV-Info Agent
NVIDIA adopted a data flywheel for its internal NV-Info agent, an employee support agent that provides access to enterprise knowledge across various domains like HR, finance, IT, and product documentation [06:44:00].
The architecture for this agent involves:
- An employee submits a query to the agent, which is guardrailed for safety and secure interaction [07:28:00].
- A router agent, run by an LLM, orchestrates multiple underlying expert agents [07:37:00].
- Each expert agent specializes in a specific domain and is augmented with a RAG pipeline to fetch relevant information [07:47:00].
- A data flywheel loop is set up to decide which models power these agents, building on user feedback and production data inference logs [08:03:00].
- Ground truth data is continuously curated using subject matter experts and human-in-the-loop feedback [08:20:00].
- Nemo Customizer and Evaluator are used to constantly evaluate models and promote the most effective one as an NIM to power the router agent [08:27:00].
Router Agent Problem Statement and Solution
The core problem for the router agent is to accurately route a user query to the correct expert agent using a fast and cost-effective LLM [09:27:00]. Initially, a 70B LLM variant achieved a 96% baseline accuracy in routing queries, but smaller variants (e.g., 8B) showed subpar accuracy (below 14%) [09:55:00]. While larger models offer higher accuracy, they come with higher latency and inference costs [12:47:00].
To address this, the data flywheel approach was implemented:
- The 70B Llama variant was deployed, and user feedback (satisfactory/unsatisfactory responses) from NVIDIA employees was collected [11:02:00].
- Out of 1,224 data points, 495 were unsatisfactory [11:24:00].
- Nemo Evaluator, using an LLM as a judge, investigated these unsatisfactory responses, identifying that 140 were due to incorrect routing [11:44:00].
- Further manual analysis with subject matter experts confirmed 32 queries were truly due to incorrect routing [11:58:00].
- A ground truth dataset of 685 data points was created, split 60/40 for training/fine-tuning and testing/evaluation [12:09:00].
Remarkably, with just 685 data points, fine-tuning smaller models yielded significant results [12:27:00]. While the 70B variant offered 96% accuracy at 26 seconds latency, the 8B variant initially had 14% accuracy but much lower latency [12:36:00]. After fine-tuning, the 8B model was able to match the 70B variant’s accuracy [13:04:00]. Even the 1B variant achieved 94% accuracy, which is only 2% below the 70B [13:22:00].
Deploying a smaller model like the 1B variant could lead to 98% savings in inference costs, a 70x model size reduction, and 70x lower latency [13:42:00]. This demonstrates the power of data flywheels in achieving high accuracy with smaller, more efficient models, continuously learning from production logs and knowledge [13:59:00].
Framework for Building Effective Data Flywheels
Building effective data flywheels involves a four-step framework:
-
Monitor User Feedback
- Implement intuitive ways to collect user feedback signals, including explicit and implicit signals [14:48:00].
- Monitor for model drift or inaccuracies in the agentic system [15:08:00].
-
Analyze and Attribute Errors
- Spend time analyzing and attributing errors or model drift to understand why the agent is behaving in a certain way [15:12:00].
- Classify errors, attribute failures, and create ground truth datasets [15:23:00].
-
Plan
- Identify different models and generate synthetic datasets for experimentation [15:34:00].
- Fine-tune models and optimize resources and costs [15:38:00].
-
Execute
- Trigger the data flywheel cycle [15:46:00].
- Set up a regular cadence for tracking accuracy, latency, and performance [15:53:00].
- Manage the end-to-end GenAI Ops pipeline [16:05:00].