From: aidotengineer

Introduction

The discussion on AI agents begins with an exploration of why personal, local, and private agents are crucial for enhancing individual productivity and control [01:12:00]. The speaker, a co-founder of PyTorch and an individual working at Meta, emphasizes the benefits of such agents, drawing from his personal experience and work in robotics [00:41:00].

The initial inspiration for exploring AI agents came from the speaker’s use of “Swix’s AI news,” which saved him significant time by aggregating AI news, demonstrating the potential for AI to augment daily life [01:23:00]. His work in robotics, where robots are inherently agents acting in the world, further solidified his interest in understanding AI agents more deeply [02:09:00].

Defining an AI Agent

An agent is defined as something that possesses “agency,” meaning it can actively take actions in the world [03:05:07]. If a system can only gather context and process information but cannot act, it is not considered an agent [03:16:03].

The Importance of Context

A highly intelligent agent without the right context is largely useless [03:21:09]. For a personal agent to be effective, it needs access to a broad range of personal information [02:46:00].

Examples of Context Failure:

  • An agent unable to confirm a prescription renewal because it lacked access to iMessage, only having Gmail, WhatsApp, and calendar access [03:32:00].
  • An agent giving incorrect financial information because it only had access to one bank account, missing a Venmo deposit [04:08:00].

When a personal agent lacks full context, it becomes irritating and unreliable, making it effectively useless because users cannot trust its information or actions [04:17:00].

Why Personal, Local, and Private Agents?

The key takeaway is that personal agents, due to their deep access to a user’s life context, are best kept local and private [02:46:00].

Challenges in Providing Comprehensive Context

  • Wearables: While ideal for providing “see everything you see and listen to everything you hear” context, current wearable technology lacks sufficient battery life [05:17:00].
  • Phones: Running agents in the background on phones is restricted by ecosystem limitations, such as Apple’s asynchronous execution policies [05:53:00].
  • Proposed Solution: A Mac Mini placed at home, connected to the internet, can run agents asynchronously without battery life concerns. It can log into all services and access open ecosystems like Android [06:23:00].

Arguments Against Cloud-Based Agents

  1. Control and Predictability:

    • Trust in digital services (like cloud email) stems from a simple, predictable mental model (e.g., email in, reply out) [07:56:00].
    • As an agent’s “action space” becomes more powerful and unpredictable (e.g., auto-replying to your boss, making purchases based on kickbacks), users become uncomfortable trusting services they don’t fully control [08:51:00].
    • Personal life is intimate, and users desire control over how an agent acts on their behalf [09:28:00].
  2. Decentralization:

    • Current digital ecosystems are often “walled gardens” that resist interoperability [09:51:00].
    • Relying on one ecosystem for a personal agent that performs diverse actions across daily life could lead to undesirable lock-in [10:01:00]. Local agents promote decentralization and interoperability.
  3. Confidential AI and its impact (Thought Crimes):

    • Users might ask a personal agent things they would never say out loud [11:00:00].
    • Cloud providers, even with enterprise-grade contracts, are often legally mandated to perform logging and safety checks [11:32:00].
    • This poses a risk of being “prosecuted or persecuted for thought crimes,” making local agents preferable for the most personal augmentation [11:50:00].

Challenges in Building Effective AI Agents

Technical Challenges

  • Local Model Inference Speed: While projects like VM and SG Lang (built on PyTorch) enable running local models, inference is currently slow and limited compared to cloud services [12:28:00]. However, this is rapidly improving; distilled models can run fast, but the latest, unquantized models remain slow [13:28:00]. This is expected to self-resolve [13:49:00].

Research and Product Challenges

  • Open Multimodal Models: Current open multimodal models are “good but not great,” particularly for computer use, often breaking [14:20:00].
  • Visual Understanding for Shopping: Models struggle with specific visual identification in shopping queries, often relying on text matching rather than precise visual recognition [14:52:00].
  • Catastrophic Action Classifiers: A significant gap exists in the ability of agents to identify “catastrophic actions” – irreversible or highly damaging actions [15:34:00]. While many actions are harmless or reversible, critical actions like purchasing a Tesla instead of Tide Pods require robust identification and user notification to build trust [16:05:00]. More research is needed in this area for evaluating AI agents and assistance reliably.
  • Voice Mode: Open-source voice mode is barely present, but essential for natural interaction with personal local agents [16:52:00].

Optimism for Open Models

Despite the challenges, there is strong bullishness on the future of open models for personal agents [17:06:00].

  • Compounding Intelligence: Open models are compounding intelligence faster than closed models because many independent entities contribute to their improvement in a coordinated manner [17:11:00]. This has been demonstrated by the rapid advancements seen with models like LLaMA, Mistral, Guac, and DeepSea [17:37:00].
  • Open Source Advantage: Similar to Linux and other projects, open source, once it achieves a critical coordinated mass, tends to win in unprecedented ways [18:03:00]. This suggests open models will become superior to closed models in terms of performance per dollar of investment [18:22:00].

Future Outlook

PyTorch is actively working on enabling local agents and addressing their technical challenges [19:27:00]. Additionally, efforts are underway to plug the reasoning gap between open and closed models through open reasoning data [19:14:00]. The community is also looking forward to events like LLaMA Con on April 29th, where new developments in LLaMA will be shared [19:55:00].