From: aidotengineer
The year 2025 is considered by many as the “AI agent moment” and a “perfect storm for AI agents,” given rapid advancements in reasoning models, test-time compute, engineering optimizations, decreasing inference costs, and massive infrastructure investments [00:03:38]. However, despite this excitement, AI agents are “not really working just yet” [00:04:49].
An AI agent is defined as a fully autonomous system where Large Language Models (LLMs) direct their own actions [00:05:12]. The primary obstacle to their effective operation is the accumulation of “tiny cumulative errors” [00:06:45].
Types of Cumulative Errors
Common errors observed in AI agents include:
- Decision Error The agent chooses the wrong fact, such as booking a flight to “San Francisco, Peru” instead of “San Francisco, California” [00:07:10].
- Implementation Error This involves incorrect access or integration, like encountering a CAPTCHA that disrupts the flow or being locked out of a critical database, preventing the agent from functioning [00:07:26].
- Heuristic Error The agent uses the wrong criteria, failing to account for factors like rush hour traffic or origin location when booking a flight [00:07:44].
- Taste Error The agent fails to incorporate personal preferences, such as booking a disliked aircraft type, even if not explicitly stated in the initial prompt [00:08:03].
- Perfection Paradox While AI performs “magical” tasks, human users become frustrated when an agent is slow or inconsistent, even if it eventually gets the task right [00:08:22]. This unreliability often leads to an underwhelming experience that doesn’t meet human expectations [00:08:38].
These errors compound significantly in complex, multi-agent systems involving multi-step tasks [00:09:21]. For example, an agent with 99% accuracy can drop to 60% accuracy after 50 consecutive steps, while a 95% accurate agent falls to less than 10% in the same scenario, demonstrating a 50% disparity between them [00:09:03].
Strategies to Mitigate Challenges in Developing AI Agents
To address these challenges in building reliable AI agents, five key strategies are crucial for developing and optimizing AI agents:
1. Data Curation
Ensure the AI agent has the necessary information, managing messy, unstructured, and siloed data [00:10:09]. This includes proprietary data, data generated by the agent itself, and data used for quality control [00:10:32]. Designing an “agent data flywheel” from day one allows the product to improve automatically and at scale every time a user interacts with it [00:10:49].
2. Evaluations (Evals)
Establish methods to collect and measure a model’s responses and determine the correct answer [00:11:22]. While straightforward in verifiable domains like math or science, evaluating AI agents for non-verifiable systems (e.g., subjective preferences) requires collecting human preferences and building truly personal evaluations [00:11:47].
3. Scaffolding Systems
Design systems to prevent one error from cascading throughout the entire agentic system or production infrastructure [00:12:45]. This involves building complex compound systems and, at times, reintroducing a human into the loop [00:13:06]. Future improvements include self-healing agents that correct their own paths or pause execution when unsure [00:13:18].
4. User Experience (UX)
Focus on reimagining product experiences and deeply understanding user workflows to promote human-machine collaboration [00:14:02]. Since many AI apps use the same foundation models, exceptional UX and product quality are critical differentiators [00:14:45]. This involves asking clarifying questions, predicting user next steps, and seamlessly integrating with legacy systems [00:14:13]. Industries with proprietary data and deep user workflow knowledge, such as robotics, hardware, defense, manufacturing, and life sciences, offer significant opportunities [00:14:55].
5. Multimodality
Move beyond traditional chatbot interfaces and build agents that leverage new modalities to create a 10x personalized user experience [00:15:22]. This means adding “eyes, ears, nose, a voice,” and even the sense of touch through robotics, to make AI more human and embodied [00:15:43]. Incorporating “memories” can also make AI truly personal and deeply understanding of the user [00:16:07].
While we are at the “perfect storm” for AI agents, their full potential has not yet been realized due to cumulative errors leading to wrong answers, preferences, and human expectations [00:16:51]. Addressing these design challenges for AI agents through data curation, evaluations, scaffolding, superior UX, and multimodality will be key to creating visionary products that set new standards for human-AI interaction [00:17:10].