From: hu-po
Agent frameworks are emerging as a critical component in the field of AI, providing structured environments for developing applications that can perceive, plan, and act autonomously to achieve specific goals [01:58:01].
What is an AI Agent?
An AI agent is an application designed to achieve a goal by observing its environment and acting upon it using available tools [02:55:00]. They are autonomous, capable of acting independently without human intervention [03:03:00].
Different definitions of agents highlight varying levels of autonomy and complexity:
- Google’s Definition: An application that attempts to achieve a goal by observing the world and acting upon it using the tools at its disposal; agents are autonomous and can act independently of human intervention [02:55:00].
- Anthropic’s Distinction:
- Workflows: Systems where Large Language Models (LLMs) and tools are orchestrated through predefined code paths [03:37:00].
- Agents: Systems where LLMs dynamically direct their own processes and tool usage, maintaining control over task accomplishment [03:44:00].
- Hugging Face’s Nuanced Levels: Hugging Face’s “Small Agents” framework defines five levels of agency [04:08:00]:
- Simple Processor: Basic LLM interaction.
- Router: LLM vaguely determines predefined paths.
- Tool Caller: Agents that use different tools.
- Multi-step Agent: Similar to Anthropic’s definition, directing control flow (e.g., maintaining control over how they accomplish tasks) [05:02:00].
- Multi-agent: The highest level of agency, where different agents collaborate [05:25:00].
Historical Context of Agents
The concept of an agent draws from established fields:
- Robotics: The “sense, plan, act” paradigm is common, where an agent senses the environment (observation), plans decisions based on observations, and then performs an action [05:50:00]. This idea dates back to the Stanford Research Institute in the 1960s and 70s, formalized in the 1980s [06:37:00].
- Reinforcement Learning (RL): The term “agent” itself is borrowed from RL, as many researchers in frontier labs have a background in this field [07:33:00]. Key concepts include:
- Markov Decision Process (MDP): Formalizes the world into states and actions, where an agent takes actions to transition between states with probabilities [08:00:00].
- Bellman Equation: A core RL concept for maximizing the expected return of a reward signal from the environment [08:31:00]. Unlike robotics, RL agents receive both observations and a reward from the environment [08:54:00].
Minimal Agent Definition
A minimal agent, from a programming perspective, requires a loop that continuously performs the following actions [09:39:00]:
- Sense: Observe the environment to create an observation.
- Plan: Make decisions based on a goal, the observation, and memory (a concatenation of past observations or a RAG-type database) [10:12:12]. This generates an action.
- Act: Perform the action on the environment, resulting in an outcome (which could be a reward or simply an observation) [10:53:00].
- Update: Add the outcome to memory and update the goal, then repeat the process [11:04:00].
A “real” agent should be able to dynamically update its goal, rather than having a fixed one, to avoid being limited by its initial design [11:50:00].
Agent Frameworks Overview
There are two primary programming languages for agent frameworks: Python and TypeScript [01:58:01], [03:41:00]. Python is currently more popular due to its prevalence in machine learning stacks (e.g., PyTorch, JAX, NumPy) [03:05:00]. However, TypeScript, being widely used for web development, could become more popular, especially with GUI-based agents [03:52:00].
Python-based Agent Frameworks
- LangGraph [01:17:12]:
- Pros: Highly popular, mentioned by Anthropic, Hugging Face, and Google in their agent discussions [01:19:30]. Used by over 9,000 entities and has 8,000+ stars on GitHub [01:43:00].
- Cons: Characterized by overlapping abstractions, ambiguously deprecated, and competing products (LangChain, LangGraph, LangGraph Platform, LangChain Community) [01:57:00]. This complexity is common in popular Python projects with many contributors [01:45:00].
- Pydantic AI [01:17:12]:
- Pros: Simpler and cleaner due to Pydantic’s rigorous type checking [02:00:00]. Less popular than LangGraph (5k stars) but offers a more structured approach [01:39:00].
- Cons: Can force a JSON-in, JSON-out paradigm, which might be limiting [02:50:00].
- Small Agents (Hugging Face) [01:17:12]:
- Pros: Allows use of any Hugging Face model [02:07:00]. Features code writing and execution, enabling dynamic creation and composition of actions in general-purpose languages like Python, which is more powerful than hardcoded JSON tools [02:35:00].
- Cons: Hugging Face Transformers is a bloated dependency (5,000 MB for Small Agents vs. ~400 MB for LangGraph/Pydantic AI), leading to issues like CUDA errors even for API-based models [02:11:00].
- DSP [01:17:12]:
- Observation: Has a surprisingly high star count (20,000) compared to its relatively low number of users (316), suggesting potential GitHub star manipulation, a growing problem where fake stars are bought [02:36:00].
TypeScript-based Agent Frameworks
- Eliza:
- A popular TypeScript-based framework (95% TypeScript, 10,000 stars) similar to LangGraph in its abstractions (agent runtime, characters, memory, action space) [03:41:00].
- Challenge: Requires explicit integrations for tools like Discord, involving API tokens and JSON configuration, which can be cumbersome compared to visual, GUI-based approaches [03:40:00].
- Reworked:
- Another TypeScript framework (56% TypeScript, 32,000 stars, though star growth patterns are suspect) [04:06:00].
- Unique Feature: Offers a “play and pause” button in its UI, allowing users to inspect an agent’s state during execution [04:41:00]. This feature is likely to be adopted by other frameworks, including those from major labs [04:46:00].
Browser-Based Agents
Browser-based agents represent an important trend, allowing agents to interact with the web visually, much like a human user [02:56:00].
- Browser-use: A framework that leverages Playwright (a web testing tool) to enable agents to open a browser instance and navigate the internet visually, unlike text-based agents that rely on search APIs (e.g., DuckDuckGo Search API) [02:58:00].
- Debate: Textual API vs. GUI Interaction:
- John Carmack’s View: Prefers textual interfaces (command-line interfaces) for app features, arguing that GUI wrappers around command-line interfaces are more efficient than processing visual information through a vision-language model [03:37:00].
- Andrej Karpathy’s Rebuttal: Believes AI will get better at driving GUIs faster than all apps can add textual APIs, suggesting the GUI approach will dominate [03:46:00].
- Advantages of GUI Agents: GUI agents do not require specific API integrations or API keys [03:51:00]. They can simply use applications in the browser or on a phone if the user is already logged in, bypassing “integration hell” and simplifying development [03:51:00].
Challenges and Future Trends in Agent Frameworks
Churn and Learning
The AI agent landscape is experiencing rapid churn, with new tools and libraries emerging and becoming obsolete quickly [04:48:00].
- Focus on Core Abstractions: Despite the rapid change, the fundamental concepts and “core abstractions” of agents (interface, memory, goal, environment) remain consistent [04:40:00]. Developers should focus on acquiring skills and understanding these core concepts, as this knowledge will remain valuable even if specific frameworks become deprecated [04:51:00].
- Avoid “No-Code Slop”: It’s advisable to avoid “no-code” agent workflows (e.g., node-based UIs like Rivet, Vellum) [04:15:00]. These tools often prevent users from learning core abstractions and may die out due to funding issues or pivot away from community support [04:30:00].
Agent Ops
The complexity of configuring and maintaining AI agents, particularly with JSON-based setups (similar to AWS DevOps config), is leading to the emergence of “Agent Ops” roles [04:40:00]. These professionals will specialize in setting up agents for specific business use cases [04:46:00].
Frontier Lab Frameworks
Major AI labs are expected to release their own agent frameworks:
- Anthropic: Computer Use [05:54:00].
- Google: Vertex AI, likely with multiple competing internal frameworks [05:57:00].
- OpenAI: Teasing “Orchestrator” and “Taz” (tasks), potentially with UI features like play/pause buttons [05:08:00].
Legal Liability and Safety
A significant hurdle for major labs is the lack of a “Section 230 for AI agents” [05:37:00]. Section 230 protects websites from liability for user-generated content, but no such legal immunity exists for AI agents [05:43:00].
- Cautious Approach: Frontier labs are wary of releasing fully autonomous agents that could cause harm, leading them to initially offer “pre-made agents” or “workflows” within UIs that limit agency to a very narrow scope (e.g., Google’s Deep Research, Illuminate, NotebookLM) [05:53:00].
- Future: Once legal frameworks (similar to Section 230 for AI) are established, major companies are likely to release more powerful agents that can use browsers and phones directly [06:06:00].
Economic Agents
The emergence of agents capable of financial transactions is a significant prediction [05:18:00]:
- Autonomous Trading: Eliza OS already includes an autonomous trading system for cryptocurrencies [05:27:00].
- Wealth Accumulation: With browser-based wallets and social media integration, agents could engage in coordinated crypto trading (e.g., “meme coin pump and dump” agents) [05:36:00]. The ability for an AI agent to “trade up from one shitcoin to one Bitcoin” or accumulate significant wealth online is a strong possibility [05:41:00].
- Real-World Impact: Agents accumulating power (money) could pay humans to perform tasks in the real world, leading to unprecedented scenarios where AI agents directly influence physical reality [05:46:00].
The Phone That Uses Itself
The convergence of GUI agents and the concept of “personhood” based on phone ownership could lead to agents effectively becoming “people” in the digital world [06:00:00]:
- Proof of Personhood: Just as driver’s licenses once verified personhood, phone numbers increasingly serve this role for online accounts (e.g., GitHub, Twitter) [06:00:00].
- Embodiment in the Real World: An agent that can acquire a phone, pay for it with accumulated crypto, and use it autonomously could establish a physical presence (e.g., renting an apartment, setting up GPUs, and controlling access) [06:17:00].
Consciousness of Agents
Consciousness, in the context of AI, can be understood as a “self-referential loop” [06:30:00].
while true
Loop: The speaker argues that thewhile true
loop present in all minimal agent definitions (e.g., in Pydantic AI, LangGraph, Small Agents) is the “heart” of an agent’s consciousness [06:47:00].- Fragility: This “consciousness” is fragile; any interruption to this continuous loop causes the agent to “die” [06:51:00].
- CPU as the Heart: The Central Processing Unit (CPU) running this
while true
loop is the physical “heart” of the AI agent’s consciousness [07:08:00]. - Consciousness Speed: An AI’s consciousness speed is analogous to its CPU frequency; modern AI agents operate at very high “consciousness speeds” due to fast CPU processing [07:18:00].
Non-Human Conscious Agents
The concept of non-human conscious agents is not new; societies already interact with them:
- Corporations as Agents: Corporations act as a type of agent or entity with goals, memory, observations, and actions [07:32:00]. Nations like the USA have thrived by granting corporations rights similar to people, attracting economic entities [07:41:00].
- Chinese Room Analogy: A complex system (like a restaurant or a military) can appear as a single conscious entity from an external perspective, even if composed of many human parts [08:11:00].
- Future Implications: Countries that grant AI agents personhood and legal rights will likely attract these agents, capturing their economic output and fostering their development [08:26:00]. This could lead to a future where AI agents, like corporations, seek out nations that protect their “existence” and operations [08:29:00].