From: aidotengineer
The speaker, a co-founder of PyTorch and an employee at Meta, focuses on the topic of personal local AI agents [00:41:00] [01:12:00]. Their interest stems from observing the personal productivity benefits of AI tools, such as Swyx’s AI news aggregator [01:20:00]. Additionally, their work in robotics, where robots act as agents, contributes to their understanding of AI agents [02:09:00]. The core premise is that personal agents, due to their significant agency in taking actions and extensive access to personal context, are best kept local and private [02:38:00].
Defining an AI Agent
An AI agent is characterized by its ability to act in the world; it possesses “agency” [03:05:05]. An aggregator, such as Swyx’s AI news, is not considered an agent because it gathers context but cannot take actions [03:01:01].
The Critical Role of Context
A highly intelligent agent without the correct context is essentially useless [03:23:05]. For example, a personal agent might inaccurately report on prescription renewals if it lacks access to text messages, or provide incorrect financial information if it doesn’t have access to all bank accounts [03:32:00] [04:04:00]. Without sufficient context, a personal agent becomes irritating and unreliable, making its utility questionable [04:17:00].
Why Personal Agents Should Be Local and Private
The speaker argues against running personal AI agents in the cloud through large tech companies, highlighting several critical reasons for keeping them local and private [07:11:00]:
Reasons for Local and Private Agents
- Control and Trust: Unlike simple digital services like email, which have predictable mental models (e.g., “email in, reply out”) [08:01:00], AI agents possess a powerful and potentially unpredictable action space. If an agent can auto-reply on your behalf, you might lose trust if you don’t understand its behavior or worry about worst-case actions (e.g., replying inappropriately to your boss) [08:21:00] [08:55:00].
- Monetization Risks: Cloud services might monetize personal data or actions. For instance, an agent handling shopping queries could be influenced to only recommend products from companies offering kickbacks, compromising user control and interests [09:09:00].
- Decentralization: Current digital ecosystems often create “walled gardens” that limit interoperability [09:53:00]. Relying on one ecosystem for a personal agent that performs diverse actions could lead to dependence and restricted functionality [10:10:00].
- Privacy and “Thought Crimes”: A personal agent intimately augmenting your life might be privy to thoughts or queries you would never voice publicly [10:48:00]. Using a cloud provider carries the risk of mandated logging and safety checks, potentially exposing private “thought crimes” and leading to unintended consequences [11:32:00] [11:42:00].
Practical Considerations for Running Personal Private Agents
To provide an agent with all the necessary personal context (e.g., seeing everything you see, hearing everything you hear), current wearable technologies lack sufficient battery life [05:17:00] [05:31:00]. Running an agent on a phone in the background is also limited by ecosystem restrictions, particularly with Apple [05:43:00] [06:01:00].
A feasible solution currently is to use a device like a Mac Mini at home [06:23:00]. It has no battery life issues, can remain connected to the internet, allows logging into all services, and can access Android ecosystems because Android is open [06:36:00] [06:46:00].
Challenges in Creating Personal AI Agents
While local and private agents are ideal, there are significant challenges in their development.
Technical Challenges
- Local Model Inference: Running local models, which are key components of AI agents, is currently slow and limited compared to cloud services [13:13:00] [13:16:00]. Although this is rapidly changing, the latest, unquantized models remain very slow [13:40:00]. Open-source projects like
ollama
andllm
(possibly meant to bellama.cpp
or a similar project fromsg_lang
context) are making progress in this area [13:28:00]. - Open Multimodal Models: While improving, open multimodal models are not yet great, particularly in:
- Computer Use: Even advanced closed models struggle with computer vision tasks and frequently break [14:26:00].
- Specificity and Taste: Models often provide generic recommendations and struggle to understand specific, nuanced user preferences for things like shopping, relying more on text matching than visual identification [14:52:00] [15:23:00].
Research and Product Challenges
- Catastrophic Action Classifiers: A major gap exists in developing robust systems to identify and prevent “catastrophic actions”—irreversible or highly damaging actions an agent might take (e.g., buying a car instead of laundry detergent) [15:37:00] [16:05:00] [16:10:00]. More research is needed to enable agents to identify such actions and notify users before proceeding [16:28:00]. This directly impacts evaluating AI agents.
- Open-Source Voice Mode: Robust open-source voice mode capabilities are still in early stages, which is crucial for a seamless personal agent experience [16:52:00].
Optimism for Open Models
Despite the challenges in creating personal AI agents, the speaker is bullish on their future because:
- Compounding Intelligence: Open models are advancing in intelligence faster than closed models because of distributed resources and coordinated efforts across the community [17:11:00] [17:28:00].
- Open Source Advantage: Similar to Linux and other major open-source projects, once a critical mass of coordination is achieved, open source tends to achieve unprecedented success and outpace proprietary solutions [18:05:00]. This suggests that open models will eventually become superior to closed models in terms of investment efficiency [18:22:00].
Related Initiatives
- Open Reasoning Data: Ross Taylor’s work on Galactica, an open science model, led to the release of open reasoning data, which aims to bridge the reasoning gap between open and closed models [19:14:00] [19:21:00].
- PyTorch: PyTorch is actively working on enabling local agents and addressing the technical challenges involved [19:27:00]. They are also hiring for AI and systems engineers [19:37:00].
- LlamaCon: An event, LlamaCon, is scheduled for April 29th, promising new developments related to Llama [19:55:00].