From: aidotengineer
The year 2025 is described as being “off to an even Wilder start” for AI, following exponential progress in the two and a half years prior, particularly in the last 18 months [02:20:00] [02:48:00]. Progress is becoming more aggressive, impressive, and widespread beyond just OpenAI and Anthropic, including models from xAI, Mistral, and DeepSeek [02:22:00]. Models are increasingly performant and compute-efficient [02:37:00].
Notable developments in 2025 include:
- The announcement of the $500 billion Stargate project involving the US government, OpenAI, SoftBank, and Oracle [02:55:00].
- OpenAI’s O3 model exceeding human performance in the ARC AGI challenge [03:03:00].
- DeepSeek’s R1 model launch, impacting Nvidia shares and reaching number one in the app store [03:10:00].
- The France AI Summit, where Macron launched a new AI initiative [03:26:00].
The AI Agent Moment
2025 is considered the “AI agent moment” and a “perfect storm for AI agents” [03:38:00] [03:41:00]. This environment is fostered by several factors:
- Reasoning Models Advanced models like OpenAI’s O1 and O3, DeepSeek’s R1, and Grok’s latest reasoning model are outperforming human ability and demonstrating new capabilities [03:47:00].
- Test-Time Compute Increased compute applied at inference rather than training is boosting model performance [04:04:00].
- Engineering and Hardware Optimizations Significant engineering feats and hardware efficiency are evident, exemplified by the DeepSeek model [04:12:00].
- Cost Reductions Inference and hardware are becoming cheaper [04:24:00].
- Closing Gap The performance gap between open-source and closed-source models (e.g., DeepSeek and Llama) is narrowing [04:26:00].
- Infrastructure Investment Billions are being invested in data centers and compute, including the US Stargate project, France’s initiative, and Japan’s investments with Sopig and Nvidia [04:34:00].
This groundwork sets the stage for the autonomous “agents at work” vision [04:48:00].
Current State of AI Agents
Despite the excitement, AI agents are not yet fully operational [04:59:00]. An AI agent is defined as a “fully autonomous system where LLMs direct their own actions” [05:12:00]. An example of booking a flight illustrates the current limitations, where an OpenAI operator struggled with complex user preferences and contextual information [05:27:00].
Cumulative Errors
Unlike hallucinations, which are widely discussed, the problem lies in “tiny cumulative errors that add up” [06:54:00]. These errors compound in complex multi-agent systems and multi-step tasks [09:21:00]. Types of errors include:
- Decision Error Choosing the wrong fact, such as booking a flight to “San Francisco, Peru” instead of “San Francisco, California” [07:10:00].
- Implementation Error Wrong access or integration, like encountering a CAPTCHA or being locked out of a critical database [07:23:00].
- Heuristic Error Using the wrong criteria, such as not accounting for New York City traffic conditions or the user’s starting location when booking a flight [07:41:00].
- Taste Error Failing to account for personal preferences, like booking a flight on a specific airplane model a user dislikes, even if not explicitly stated [08:03:00].
- Perfection Paradox User frustration arises when AI, despite its magical capabilities, is inconsistent or unreliable, leading to underwhelming human expectations [08:22:00]. Even highly accurate agents (e.g., 99%) can see their effective accuracy drop significantly over many sequential steps (e.g., to 60% over 50 steps) [08:53:00].
Strategies for Building Better AI Agents
To address these challenges and consistently and reliably make the right decisions, several best practices and strategies are emerging for building AI agents:
1. Data Curation
Data is paramount for AI agents [10:04:00]. Data is messy, unstructured, and siloed, encompassing web/text, design, image, video, audio, sensor, and even agent-produced data [10:11:00]. Key aspects include:
- Proprietary Data Curating proprietary data is crucial for quality control in the model workflow [10:32:00].
- Agent Data Flywheel Designing systems where every user interaction automatically improves the product in real-time and at scale [10:49:00]. This includes curating user preferences and recycling past content to adapt to them [11:01:00].
2. Evaluations (Evals)
Measuring a model’s response and choosing the correct answer is critical [11:19:00]. While straightforward in verifiable domains (e.g., math, science), it’s challenging for non-verifiable systems [11:33:00].
- Collecting Human Preferences Evals need to collect signals and build evaluations that capture human preferences and personal needs [12:25:00]. Sometimes, the best evaluation is personal “Vibes” based on individual needs, rather than numbers or leaderboards [12:33:00].
3. Scaffolding Systems
These systems prevent a single error from causing a cascading effect throughout the organization or agentic system [12:45:00].
- Mitigation through Compound Systems Building complex compound systems mitigates cascading errors [13:06:00].
- Human-in-the-Loop Sometimes, bringing a human back into the loop is necessary [13:12:00].
- Self-Healing Agents Stronger agents can adapt scaffolding, realizing their own errors, trying to correct their path, or pausing execution when unsure [13:16:00]. Checkpoints can be added for verification, such as for traffic conditions [13:33:00].
4. User Experience (UX)
UX is crucial for making AI agents better co-pilots [13:44:00]. Foundation models are rapidly depreciating assets [13:53:00], meaning UX truly differentiates AI apps [13:59:00]. Companies that reimagine product experiences, deeply understand user workflows, and promote “beautiful elegant human machine collaboration” will succeed [14:02:00].
- Contextual Understanding Asking clarifying questions to fully understand user intent [14:13:00].
- Predictive UX Understanding the user’s psyche to predict their next step [14:20:00].
- Seamless Integration Integrating seamlessly with legacy systems to create real ROI [14:27:00].
Lux Capital is particularly interested in new AI Frontier companies with proprietary data sources and deep understanding of specific user workflows, such as those in robotics, hardware, defense, manufacturing, and life sciences [14:51:00].
5. Multimodality
Moving beyond the chatbot interface, building multimodally can create a 10x more personalized user experience [15:22:00]. This involves making AI more human by adding “eyes and ears nose a voice” [15:43:00].
- Sensory Integration Significant improvements are seen in voice, and efforts are underway in smell (e.g., Osmo) and touch to instill a more human feeling and embodiment with robotics [15:49:00].
- Memories The ability to make AI truly personal and know a user on a deeper level [16:07:00].
- Visionary Products Reframing the definition of “perfection” to humans means creating products so new and visionary that they exceed expectations, even if inconsistent [16:18:00]. An example is Tlop, which reimagines the visual canvas by implementing AI through brush strokes and combining multiple AI models seamlessly [16:28:00].
In summary, while 2025 presents a perfect storm for AI agents, they are not yet fully realized due to cumulative errors leading to wrong answers, preferences, criteria, and human expectations [16:51:00]. Overcoming these challenges requires data curation, effective evaluations, scaffolding systems, and a strong focus on user experience and multimodality to create innovative product experiences [17:09:00].