Challenges in creating effective AI agents

From: aidotengineer

AI agents, particularly personal ones, hold significant potential for augmenting daily life, but their effective and trustworthy development faces several challenges. The speaker, a co-founder of PyTorch, emphasizes the importance of local and private AI agents due to concerns about control, privacy, and decentralization [00:02:50].

Defining an AI Agent

An AI agent is characterized by its ability to act in the world and possess agency [00:03:05]. If a system can only gather context or information but cannot take actions, it is not considered an agent [00:03:12].

The Critical Role of Context

A primary challenge in building effective AI agents is ensuring they have sufficient context. An intelligent agent without the right context is considered “useless” or “as good as a bag of rocks” [00:03:21].

Examples of Context Limitations

Partial Information: An agent might fail to accurately answer a personal query, like whether a prescription was renewed, if it lacks access to all relevant communication channels (e.g., iMessage vs. Gmail) [00:03:32].
Incomplete Financial Data: Similarly, an agent might misinform about finances if it only accesses one bank account but not other platforms like Venmo [00:04:04].

Without comprehensive context, a personal agent becomes unreliable and “irritating to use,” as users cannot predict when it will be useful or provide accurate information [00:04:17]. For an agent to be truly useful, it must achieve a high level of reliability and predictability [00:04:36].

Practical Considerations for Local Agents

Achieving comprehensive context for a personal agent ideally means it should “see everything you see and listen to everything you hear” [00:05:21]. However, this presents technical challenges:

Wearable Devices: Current wearable technology lacks sufficient battery life to continuously provide context to an AI [00:05:31].
Smartphones: Running agents persistently in the background on phones is limited by ecosystem restrictions, particularly on platforms like Apple [00:06:01].
Feasible Local Device: A Mac Mini, connected to the internet and running asynchronously at home, is suggested as a feasible device for running personal AI agents due to no battery issues and access to various services and Android ecosystems [00:06:30], [00:07:01].

Arguments for Local and Private Agents

The speaker strongly advocates for local and private personal agents over cloud-based services for several reasons, highlighting key challenges and benefits of AI agents in a cloud environment:

Trust and Predictability: Unlike simple digital services like email, which have predictable behaviors (“email in, reply out”), AI agents possess a powerful and unpredictable action space [00:08:01], [00:08:18].
- Loss of Control: Users become uncomfortable when an AI agent can take significant actions (e.g., auto-replying to a boss with inappropriate content) without explicit user control [00:08:27].
- Monetization Incentives: Cloud service providers might monetize by influencing agent actions (e.g., directing shopping queries to partners offering kickbacks), eroding user trust and control over personal decisions [00:09:09].
Decentralization: Relying on one ecosystem for a personal agent can lead to “walled gardens” and lack of interoperability, which is problematic for an agent that needs to interact with various aspects of a user’s daily life [00:09:51].
Privacy and “Thought Crimes”: AI agents, especially those augmenting users intimately, may be exposed to highly personal thoughts or queries that users would not vocalize publicly [00:11:02].
- Logging and Safety Checks: Cloud providers, even with enterprise-grade contracts, are legally mandated to perform logging and safety checks, potentially exposing sensitive or “thought crime” data [00:11:23].
- Risk of Persecution: The speaker expresses a desire to avoid scenarios where personal thoughts or queries could lead to prosecution or persecution, making local agents a safer choice [00:11:47].

Technical and Research Challenges

Even with the conviction for local agents, significant technical challenges in AI agent development remain:

Local Model Inference:
- Running local models, even with projects like ollama and sglang (built on PyTorch), is currently “slow and limited” compared to cloud services [00:13:13], [00:12:35].
- While smaller, distilled models run faster, the latest unquantized models remain very slow locally [00:13:29]. This is expected to improve over time [00:13:48].
Multimodal Models:
- Open multimodal models are “good but not great,” especially for computer use, and even closed models frequently “break” [00:14:20], [00:14:25].
- They are not adept at understanding specific visual tastes in shopping, often relying on text matching rather than true visual identification [00:14:52], [00:15:23].
Catastrophic Action Classifiers: A major gap exists in developing robust classifiers to identify and prevent “catastrophic actions” [00:15:37].
- Many agent actions are harmless and reversible, but some, like purchasing a Tesla instead of Tide Pods, are disastrous [00:15:48].
- More research is needed to enable agents to identify such actions before taking them and notify users instead [00:16:25]. Trust in agents, whether personal or cloud-based, hinges on improving this capability [00:16:40].
Voice Mode: Open-source voice mode for local agents is “barely there,” despite being a crucial feature for natural interaction [00:16:52], [00:16:56].

Optimism for the Future

Despite these challenges, the speaker remains optimistic, particularly about open models. Open models are seen as “compounding in intelligence faster than closed models” due to coordinated improvement across a broader community [00:17:11]. This phenomenon, observed with projects like Linux, suggests that once open source reaches a critical mass, it can achieve unprecedented success [00:18:03].

The speaker highlights ongoing efforts, including PyTorch’s work on enabling local agents by addressing technical challenges [00:19:27], and the release of open reasoning data to bridge the reasoning gap between open and closed models [00:19:14].

Tubegraph

Explorer

Table of Contents