From: aidotengineer

This article discusses the advantages and challenges of open source and local AI models, particularly in the context of personal agents, as presented by a co-founder of PyTorch at Meta [00:00:37].

The Case for Personal Local Agents

The speaker’s interest in personal local agents stemmed from observing how AI news aggregators significantly improved personal productivity [00:01:18]. This led to a deeper exploration of AI agents [00:01:29].

What is an Agent?

An agent is defined as something that can take action in the world, possessing “agency” [00:03:05]. An AI that can only receive context and process information, but cannot act, is not considered an agent [00:03:16].

The Importance of Context for Agents

A highly intelligent agent without the right context is deemed “as good as a bag of rocks” or “useless” [00:03:23]. For example, a personal agent might lie about a prescription renewal because it lacked access to all relevant communication channels (e.g., iMessage vs. Gmail) [00:03:32]. An agent that doesn’t have the full context of a user’s life will largely be irritating and unreliable, leading to a lack of trust [00:04:17].

Ideally, a personal AI should see everything the user sees and hear everything the user hears to gain sufficient context [00:05:21].

Why Local and Private?

For personal agents, keeping them local and private is crucial due to the immense amount of personal life context they would access [00:02:46].

  • Control and Predictability: Unlike simple digital services like cloud email, which have a predictable “in, reply, out” mental model [00:08:01], an AI agent’s action space is powerful and potentially unpredictable [00:08:55]. Users lose comfort and control when a service can auto-reply or make purchasing decisions on their behalf [00:08:31]. There’s concern that commercial and enterprise application of open AI models could prioritize kickbacks over user interest when making shopping queries [00:09:17].
  • Decentralization: Current ecosystems often create “walled gardens” that limit interoperability [00:09:55]. Relying on one ecosystem for an agent that takes diverse actions across daily life might be problematic [00:10:04].
  • Privacy of “Thought Crimes”: An intimate personal agent might be asked questions that a user would never say out loud [00:11:07]. There’s a risk of legal or social ramifications if such personal interactions are logged or exposed by a third-party provider, even with enterprise-grade contracts [00:11:36].

Feasible Local Device

While wearable AI devices and phone-based agents face battery limitations or ecosystem restrictions (like Apple’s asynchronous process limitations) [00:05:31], a Mac Mini in the home is suggested as a feasible device for running asynchronous personal agents [00:06:30]. It avoids battery issues, can log into all services, and can access Android ecosystems [00:06:38].

Challenges for Open Source and Local Models

Despite the benefits, there are technical and research challenges for local, open source models:

  • Local Model Inference Speed: As of today, local model inference is slower and more limited compared to cloud services, even on beefy machines [00:13:13]. While models like a 20 billion or distilled model can run fast locally, the latest, full unquantized models are “super duper damn slow” [00:13:38]. However, this is seen as a rapidly improving area that will likely “fix itself” [00:13:49]. PyTorch is actively working on enabling local agents and addressing these technical challenges [00:19:27].
  • Multimodal models and omni models development Quality: Open multimodal models are good but not yet great, especially for computer use [00:14:20]. Even closed models struggle with tasks like visually identifying specific items based on detailed user tastes, often relying on text matching [00:15:00].
  • Lack of Catastrophic Action Classifiers: A significant gap is the inability of agents to reliably identify “catastrophic actions” before taking them [00:15:37]. While many actions are harmless or reversible (e.g., visiting the wrong Wikipedia link), some, like an agent purchasing a car instead of Tide Pods, are disastrous [00:16:05]. More research is needed in AI safety and model interpretability, specifically on how agents can better identify and notify users about such high-impact actions [00:16:28].
  • Voice Mode: Open source voice models are described as “barely there,” limiting the desired interaction method for personal agents [00:16:52].

Bullish on Open Models

Despite the challenges, the speaker is very optimistic about open source models:

  • Faster Intelligence Compounding: Open models are seen to be compounding intelligence faster than closed models [00:17:11]. While companies like OpenAI and Anthropic improve their own models, open models benefit from coordinated improvement across a broader community [00:17:23].
  • Power of Open Source Coordination: Historically, open source projects, once they achieve a critical mass for coordination, begin to win in unprecedented ways (e.g., Linux) [00:18:03]. Examples like Llama, Mistral, and Grok (by Deep Sea) are cited as evidence that open models are increasingly competing with and potentially surpassing closed models [00:17:37].
  • Cost-Effectiveness: Open models are expected to become better than closed models “per dollar of investment” [00:18:26].

Relevant Projects