From: aidotengineer

The year 2025 is considered by many as the “AI agent moment” and a “perfect storm for AI agents,” given rapid advancements in reasoning models, test-time compute, engineering optimizations, decreasing inference costs, and massive infrastructure investments [00:03:38]. However, despite this excitement, AI agents are “not really working just yet” [00:04:49].

An AI agent is defined as a fully autonomous system where Large Language Models (LLMs) direct their own actions [00:05:12]. The primary obstacle to their effective operation is the accumulation of “tiny cumulative errors” [00:06:45].

Types of Cumulative Errors

Common errors observed in AI agents include:

  • Decision Error The agent chooses the wrong fact, such as booking a flight to “San Francisco, Peru” instead of “San Francisco, California” [00:07:10].
  • Implementation Error This involves incorrect access or integration, like encountering a CAPTCHA that disrupts the flow or being locked out of a critical database, preventing the agent from functioning [00:07:26].
  • Heuristic Error The agent uses the wrong criteria, failing to account for factors like rush hour traffic or origin location when booking a flight [00:07:44].
  • Taste Error The agent fails to incorporate personal preferences, such as booking a disliked aircraft type, even if not explicitly stated in the initial prompt [00:08:03].
  • Perfection Paradox While AI performs “magical” tasks, human users become frustrated when an agent is slow or inconsistent, even if it eventually gets the task right [00:08:22]. This unreliability often leads to an underwhelming experience that doesn’t meet human expectations [00:08:38].

These errors compound significantly in complex, multi-agent systems involving multi-step tasks [00:09:21]. For example, an agent with 99% accuracy can drop to 60% accuracy after 50 consecutive steps, while a 95% accurate agent falls to less than 10% in the same scenario, demonstrating a 50% disparity between them [00:09:03].

Strategies to Mitigate Challenges in Developing AI Agents

To address these challenges in building reliable AI agents, five key strategies are crucial for developing and optimizing AI agents:

1. Data Curation

Ensure the AI agent has the necessary information, managing messy, unstructured, and siloed data [00:10:09]. This includes proprietary data, data generated by the agent itself, and data used for quality control [00:10:32]. Designing an “agent data flywheel” from day one allows the product to improve automatically and at scale every time a user interacts with it [00:10:49].

2. Evaluations (Evals)

Establish methods to collect and measure a model’s responses and determine the correct answer [00:11:22]. While straightforward in verifiable domains like math or science, evaluating AI agents for non-verifiable systems (e.g., subjective preferences) requires collecting human preferences and building truly personal evaluations [00:11:47].

3. Scaffolding Systems

Design systems to prevent one error from cascading throughout the entire agentic system or production infrastructure [00:12:45]. This involves building complex compound systems and, at times, reintroducing a human into the loop [00:13:06]. Future improvements include self-healing agents that correct their own paths or pause execution when unsure [00:13:18].

4. User Experience (UX)

Focus on reimagining product experiences and deeply understanding user workflows to promote human-machine collaboration [00:14:02]. Since many AI apps use the same foundation models, exceptional UX and product quality are critical differentiators [00:14:45]. This involves asking clarifying questions, predicting user next steps, and seamlessly integrating with legacy systems [00:14:13]. Industries with proprietary data and deep user workflow knowledge, such as robotics, hardware, defense, manufacturing, and life sciences, offer significant opportunities [00:14:55].

5. Multimodality

Move beyond traditional chatbot interfaces and build agents that leverage new modalities to create a 10x personalized user experience [00:15:22]. This means adding “eyes, ears, nose, a voice,” and even the sense of touch through robotics, to make AI more human and embodied [00:15:43]. Incorporating “memories” can also make AI truly personal and deeply understanding of the user [00:16:07].

While we are at the “perfect storm” for AI agents, their full potential has not yet been realized due to cumulative errors leading to wrong answers, preferences, and human expectations [00:16:51]. Addressing these design challenges for AI agents through data curation, evaluations, scaffolding, superior UX, and multimodality will be key to creating visionary products that set new standards for human-AI interaction [00:17:10].