Challenges in creating personal AI agents

From: aidotengineer

Developing personal AI agents presents multifaceted challenges, ranging from ensuring their reliable operation and access to comprehensive personal context, to navigating privacy concerns and technical limitations for local deployment [02:56:00].

Defining an AI Agent

An AI agent is characterized by its ability to act in the world and possess “agency” [03:05:01]. Unlike an aggregator, which only gathers and processes context, an agent can take actions on a user’s behalf [03:01:01]. For example, “swix is AI news,” an AI news aggregator, is not considered an agent because it cannot act in the world [02:00:00]. Robotics, on the other hand, embodies agents that act in the world [02:09:00].

The Critical Role of Context

A highly intelligent agent without the right context is largely useless [03:21:00].

Misinformation due to limited access: An agent tasked with checking prescription renewal might lie if it only has access to Gmail, WhatsApp, and calendar, missing a text message from a pharmacy on iMessage [03:32:00]. Similarly, an agent with access to only one bank account might misreport finances if money arrived via Venmo, which it cannot see [04:08:00].
User Frustration: A personal agent lacking sufficient context becomes irritating to use, making its utility unreliable. Users will question its accuracy and feel compelled to verify its outputs [04:17:00].
Reliability: For an agent to be truly useful, it must achieve a certain level of reliability and predictability where the user trusts its correctness [04:36:00].

The ideal scenario for providing context would be for the AI to “see everything you see and listen to everything you hear” [05:17:00].

Why Personal Agents Should Be Local and Private

There are several compelling reasons to keep personal AI agents local rather than relying on cloud services [07:11:00]:

1. Control and Predictability

Unlike simple digital services like cloud email, where the mental model is straightforward (email in, reply out), AI agents can take powerful and unpredictable actions [08:01:00].

Unforeseen Actions: If an email service were to auto-reply on your behalf, you would be uncomfortable due to the potential for catastrophic actions, such as sending a nasty reply to your boss [08:31:00].
Monetization Conflicts: Cloud services must monetize, raising concerns that an agent might prioritize purchases from partners offering kickbacks rather than acting purely in the user’s best interest [09:09:00].
Intimacy: Personal agents are so intimate to a user’s life that control over their behavior is paramount [09:28:00].

2. Decentralization

Current digital ecosystems are often “walled gardens” that resist interoperability [09:53:00]. Relying on one ecosystem for a personal agent that performs diverse actions across your daily life is risky and undesirable [10:01:00]. Decentralized, local agents could become the norm [10:32:00].

3. Protection from “Thought Crimes”

An intimate personal agent might be asked questions that users would never voice publicly [11:02:00]. There’s a risk that private thoughts or queries, even if deemed harmless, could be logged or used against an individual by cloud providers, especially given legally mandated logging and safety checks in enterprise-grade contracts [11:16:00]. The speaker expresses a desire to avoid being prosecuted or persecuted for “thought crimes” [11:50:00].

Practical Approaches for Local Personal Agents

Wearable Devices: While ideal for constant context, current wearable devices lack sufficient battery life [05:31:00].
Smartphones: Running agents on phones in the background is hindered by ecosystem restrictions (e.g., Apple’s limitations on asynchronous processes) [05:53:00].
Mac Mini: A feasible solution is using a Mac Mini at home. It can run agents asynchronously, has no battery life issues, and can log into various services, including those from open Android ecosystems [06:27:00].

Technical Challenges in Developing AI Agents

Even if the decision is made to go local, technical hurdles remain:

1. Local Model Inference

Speed and Limitations: As of today, local model inference is slow and limited compared to cloud services, even on powerful machines [13:13:00]. While smaller models (e.g., 20 billion parameters or distilled models) run faster, the latest, unquantized models are very slow locally [13:35:00].
Infrastructure: Projects like VLLM and SGLang, built on PyTorch, are helping to run local models [12:28:00]. This challenge is expected to resolve itself over time [13:49:00].

2. Research and Product Gaps

Multimodal Models: Open multimodal models are currently good but not great [14:20:00].
- Computer Use: Even advanced closed models struggle with computer use and frequently break [14:28:00].
- Visual Understanding and Taste: Models often provide generic shopping recommendations [14:42:00]. They struggle to match specific, fine-grained tastes and often rely on text matching rather than accurate visual identification (e.g., confusing a red velvet sofa with oak legs for a green velvet sofa without oak legs) [14:58:00].
Catastrophic Action Classifiers: There’s a significant lack of robust catastrophic action classifiers [15:34:00].
- Definition: Catastrophic actions are irreversible or highly damaging actions (e.g., buying a Tesla car instead of Tide Pods) [16:05:00].
- Need for Research: More research is needed to enable agents to identify catastrophic actions before taking them, potentially notifying the user instead [16:25:00]. This is crucial for user trust, whether for personal or cloud agents [16:40:00].
Voice Mode: Open source voice mode for agents is “barely there” [16:52:00]. Voice interaction is desired for personal agents for convenience [16:56:00].

Optimism for Open Models

Despite the challenges with current AI implementation, the speaker remains bullish about the future of personal AI agents due to the rapid advancement of open models [17:06:00]. Open models are seen as compounding intelligence faster than closed models because of coordinated, widespread community contributions (e.g., Llama, Mistral, Guac, DeepSeek) [17:11:00]. Similar to Linux and other projects, once open-source initiatives achieve a critical mass of coordination, they tend to win in unprecedented ways [18:03:00].

PyTorch, co-founded by the speaker and funded by Meta, is actively working on enabling local agents by addressing these technical challenges [00:41:00] [19:27:00].

Tubegraph

Explorer

Table of Contents