From: aidotengineer
This article explores the concept of stateful agents and their crucial role in AI memory management, contrasting them with traditional stateless Large Language Models (LLMs) and discussing practical implementations using the Leta framework.
Introduction to Stateful Agents
The term “agent” has become widely used, but its definition remains challenging. A more precise term is “stateful agent” [00:01:44]. While a common definition for an agent in the LLM era is “an LLM taking actions in a loop” [00:02:26], this definition often overlooks the critical aspect of state updates within that closed loop [00:02:36].
Current AI computational units, like transformers, are inherently stateless machines [00:02:50]. This means a mechanism is required to update their state when the loop closes [00:02:56]. This distinction between stateful and stateless agents has become highly important because LLMs are stateless [00:03:12]. Solving “stapleness” or memory is considered the most important factor for agents to deliver on their hype [00:03:23].
The Problem of Statelessness
For LLM-driven AI, “stapleness” is synonymous with memory, as LLMs effectively have no memory beyond their weights and context window [00:03:40]. Unlike humans who form new memories and learn over time [00:03:57], LLMs do not [00:03:59]. Any learning must be managed externally by the user or framework [00:04:02].
Traditionally, state has been handled by simply appending to a list in process memory [00:04:10]. While this was acceptable for early workflows, it becomes a significant problem when trying to use agents for useful applications, especially with long-running interactions [00:04:23].
Why Stateful Agents are Necessary
The main problem with current LLM-driven agents is their inability to learn from experience, or their learning is extremely limited [00:07:31]. This is particularly evident when building AI agents for assistants, companions, or co-pilots [00:07:44].
Consider an example of an AI chatbot:
Imagine an AI asking a user about their Valentine’s Day plans with “James,” only to be corrected that James is now an ex-boyfriend [00:07:51]. Without a permanent, rewritable memory store, the AI could make this devastating error repeatedly [00:08:34]. Humans, by contrast, would aggressively write such information to their “core memory” [00:08:54].
Implementing AI agents in daily operations for enterprise environments also highlights the need for statefulness. Enterprises possess far more data than can fit into an LLM’s context window [00:09:13]. Stateful agents allow the model to “learn” about the company without retraining weights, instead updating its in-context memory [00:09:34].
The promise of stateful agents includes:
- Preventing conversations from derailing [00:10:48].
- Improving the user experience over time as the AI learns [00:10:57].
- Creating human-like memory constructs, leading to more human-like agent behavior with fuzzy memory, forgetfulness, and recall [00:11:17].
MEGPT: A Memory Management System for LLMs
The concept of memory management and delegation in AI is central to the MEGPT paper [00:05:15]. If LLMs require memory management beyond simple list appending, this management should ideally be performed by another LLM, essentially enabling AI to manage AI memory [00:05:32]. This system is referred to as an “LMOS” (LLM Operating System) [00:05:41].
A key difference from stateless approaches is that a stateful agent relies on a machine assembling the context window, optimizing the arrangement of information [00:06:32]. This context is drawn from a potentially very large state that cannot fit entirely within the context window [00:06:44].
Leta Framework’s Memory Architecture
Leta, an open-source stack built on Fast API, Postgres, and Python, implements a context management system for LLMs [00:11:50]. The core idea is to make the LLM aware of the context problem, allowing it to manage memory using specific tools [00:12:00]. This is centered around tool calling, as LLMs are increasingly proficient at it [00:12:12].
The main components of AI agents’ memory in Leta include:
- Memory Blocks: Strings that hold references, allowing the agent to edit its own memory [00:23:25]. For example, an agent can update a user’s name from “Bob the Builder” to “Charles” and remember past relationships [00:23:30]. These blocks are stored in a database (Postgres) and have identifiers, enabling sharing among multiple agents in a multiagent system [00:32:04].
- Core Memory: High-level information always kept in the context window, akin to immediate human recall (e.g., a friend’s name, hobbies) [00:29:55].
- Archival Memory: Data sources existing outside the context window, acting as a general read-write data store (like a vector database of strings) [00:30:35]. Agents can “jog their memory” by searching this database using tools [00:30:40].
- Recall Memory: Specifically designed for conversation history, write-protected, and automatically updated with events [00:33:47]. It mimics a conversation search function.
Leta also provides metadata statistics about what’s outside the LLM’s context window, addressing the “you can’t know what you don’t know” problem [00:42:01].
Building effective AI agents with Leta
The Leta framework uses a server-client process where agents are intended to be stateful and persist indefinitely [00:22:06]. The server acts as a centralized source of truth, and users interact with agents via a REST API [00:17:40].
Agent Execution Flow
In Leta, every LLM invocation is a tool call [00:51:29]. Agents follow a “React-style” pattern with explicit reasoning [00:27:27]. An agent is expected to output a tool and a justification (reasoning snippet) for its use [00:52:10]. Agents can chain tool calls indefinitely, with optional limits set via the API [00:44:01]. Unlike some frameworks where agents must declare “I’m done,” Leta agents must explicitly say “I want to keep going” (via heartbeat requests), making derailment less likely [00:44:27].
Context Window Management
Leta aggressively manages the message buffer [00:42:38]. Developers can artificially cap the context window length (e.g., to 10k tokens) [00:25:14]. If the context exceeds the limit, messages are evicted to recall memory (making them searchable) and a configurable summarizer runs (e.g., truncation or recursive summary) [00:54:40].
Tooling and Customization
- Default Tools: MEGPT agents have built-in memory management tools for appending, replacing, and searching core memory, as well as inserting into external databases [00:31:30].
- Custom Tools: Python tools can be written and attached to agents [00:49:23]. Agents can even import the Leta client within a tool, allowing an agent to create or manage other agents and their memory [00:49:40].
- Sandboxing: Tools run within a sandbox (e.g., E2B) by default to prevent interference between agents, especially in multi-tenant environments [01:00:16].
- Tool Limits: Generally, performance can degrade with more than 12-15 tools, as agents may get confused [00:35:04]. A potential solution is a “split-thread agent” where memory tools are managed by a separate, subconscious agent [00:35:11].
- Tool Testing: The Leta UI allows testing tools independently of the agent, rather than relying on the agent to invoke them [00:59:45].
Multiagent systems in AI with Stateful Agents
Unlike traditional multiagent frameworks where agents are often trapped in a single Python file and run synchronously, stateful agents in Leta exist independently as services backed by APIs [01:01:21].
This paradigm enables:
- Asynchronous Communication: Agents can communicate over API channels, much like humans using messaging apps; sending a message doesn’t freeze their execution [01:01:45].
- Synchronous Communication: For scenarios requiring immediate response (e.g., supervisor-worker interactions), agents can pause execution until a reply is received [01:03:49].
- Shared State: Because memory blocks live in a database, information can be shared and updated across multiple agents instantaneously [00:32:18].
- Dynamic Groups: Agents can be grouped by tags, allowing messages to be sent to all agents matching a tag (e.g., for map-reduce patterns) [01:04:29].
- Modular Agents: Stateful agents can be “taken out” of one multiagent group and “attached” to another, retaining their learned experience and skills [01:02:11].
The ability to easily connect agents via APIs allows for complex, human-like, or machine-optimized communication patterns [01:02:27].
Practical Considerations for Building AI agents
- UI/DX: The Leta UI provides a visual environment for creating and interacting with agents, including a “context simulator” to visualize the full payload sent to the LLM [00:58:48]. This is crucial for debugging and understanding agent behavior, especially when working with stateful agents [00:59:08].
- Prompt Engineering: In-context memory tuning (e.g., system prompts, personas) is the primary way to change agent behavior [00:43:05].
- Learning Curve: While the workshop environment is set up with Docker for ease of use, manual setup can be prone to package management issues [00:39:10].
Use Cases and Future Outlook
Scaling AI agents in production benefits greatly from statefulness. Current applications include:
- Verticalized Agents: Specialized agents for specific domains that require retaining information [01:12:55].
- Enterprise Deployments: Advanced multiagent systems processing large volumes of transactions and learning about users, often without direct chatbot interaction [01:13:17].
Unsolved problems in memory management and delegation in AI include:
- Forgetting: While archival memory is timestamped for consolidation, an automatic forgetting mechanism is still an area of research [01:14:57].
- Ingesting Large Data: Compressing vast amounts of temporal data into a single memory block requires the agent to chain function calls and recursively regenerate the memory block over time [01:15:30].
- Active Document Editing: Agents are generally better at writing entire documents from scratch rather than performing line-by-line diffs, making collaborative human-agent document editing a complex challenge [01:17:14].
- Coding Agents vs. Tool Calls: The trade-off between the perceived higher performance of coding agents (which execute code directly) and the complexity of secure execution environments for every tool remains a consideration [01:17:46].Okay, here’s the article based on the transcript, formatted for Obsidian and adhering to all rules.
Stateful Agents and AI Memory Management
This article explores the concept of stateful agents and their crucial role in AI memory management, contrasting them with traditional stateless Large Language Models (LLMs) and discussing practical implementations.
Defining Stateful Agents
The term “agent” has become widely used, but its definition remains challenging [00:02:05]. A more precise term is “stateful agent” [00:01:44]. While a common definition for an agent in the LLM era is “an LLM that’s taking actions in a loop” [00:02:26], this definition often overlooks the critical aspect of state updates within that closed loop [00:02:36].
Current AI computational units, like transformers, are inherently stateless machines [00:02:50]. This means a mechanism is required for updating their state when the loop closes [00:02:56]. This distinction between stateful and stateless agents has become highly important because LLMs are stateless and widely used in AI [00:03:12]. Solving “stapleness” or memory is considered the most important factor for agents to deliver on their hype [00:03:23].
The Challenge of Statelessness
For LLM-driven AI, “stapleness” is synonymous with memory, as LLMs effectively have no memory beyond their weights and context window [00:03:40]. Unlike humans, who form new memories and learn over time [00:03:57], LLMs do not [00:03:59]. Any learning must be managed externally by the user or framework [00:04:02].
Traditionally, state has been handled by simply appending to a list in process memory [00:04:10]. While this was acceptable for early workflows, it becomes a significant problem when trying to use agents for useful applications, especially with long-running interactions [00:04:23].
Why Stateful Agents are Crucial
The main problem with current LLM-driven agents is their inability to learn from experience, or their learning is extremely limited [00:07:31]. This is particularly evident when building AI agents for assistants, companions, or co-pilots [00:07:44].
Consider an example of an AI chatbot:
Imagine an AI asking a user about their Valentine’s Day plans with “James,” only to be corrected that James is now an ex-boyfriend [00:07:51]. Without a permanent, rewritable memory store, the AI could make this devastating error repeatedly, such as asking about “boyfriend James” after a breakup [00:08:34]. This is an error a human would never make, as such information would be aggressively written to “core memory” [00:08:51].
Implementing AI agents in daily operations for enterprise environments also highlights the need for statefulness [00:09:03]. Enterprises possess far more data than can fit into an LLM’s context window (e.g., 10 million tokens) [00:09:13]. Stateful agents allow the model to “learn” about the company without retraining weights, instead updating its in-context memory [00:09:34].
The promise of stateful agents includes:
- Preventing conversations from derailing [00:10:48].
- Improving the user experience over time as the AI learns [00:10:57].
- Creating human-like memory constructs, leading to more human-like agent behavior with fuzzy memory, forgetfulness, and recall [00:11:17].
MEGPT: A Memory Management System for LLMs
The concept of memory management and delegation in AI is central to the MEGPT paper [00:05:15]. If LLMs require memory management beyond simple list appending, this management should ideally be performed by another LLM, essentially enabling AI to manage AI memory [00:05:32]. This system is referred to as an “LMOS” (LLM Operating System) [00:05:41].
A key difference from stateless approaches is that a stateful agent relies on a machine assembling the context window, optimizing the arrangement of information [00:06:32]. This context is drawn from a potentially very large state that cannot fit entirely within the context window [00:06:44].
Implementing Stateful Agents with Leta
The Leta framework, an open-source stack built on Fast API, Postgres, and Python, implements a context management system for LLMs [00:11:50] [00:17:27]. The core idea is to make the LLM aware of the context problem, allowing it to manage memory using specific tools [00:12:00]. This is centered around tool calling, as LLMs are increasingly proficient at it [00:12:12].
Leta utilizes a server-client process where agents are intended to be stateful and persist indefinitely [00:22:06]. The server acts as a centralized source of truth, and users interact with agents via a REST API [00:17:40]. The framework can be run locally using Docker [00:21:09].
Leta’s Memory Architecture
The main components of AI agents’ memory in Leta include:
- Memory Blocks: Strings that hold references, allowing the agent to edit its own memory [00:23:25]. For example, an agent can update a user’s name from “Bob the Builder” to “Charles” [00:43:30]. These blocks are stored in a database (Postgres) and have identifiers, enabling sharing among multiple agents in a multiagent system [00:32:04].
- Core Memory: High-level information always kept in the context window, akin to immediate human recall (e.g., a friend’s name, hobbies) [00:29:55].
- Archival Memory: Data sources existing outside the context window, acting as a general read-write data store (like a vector database of strings) [00:30:35] [00:45:54]. Agents can “jog their memory” by searching this database using tools [00:30:40].
- Recall Memory: Specifically designed for conversation history, write-protected, and automatically updated with events [00:33:47]. It mimics a conversation search function.
Leta also provides metadata statistics about what’s outside the LLM’s context window, addressing the “you can’t know what you don’t know” problem [00:42:01].
Agent Execution and Tooling
In Leta, every LLM invocation is a tool call [00:51:29]. Agents follow a “React-style” pattern with explicit reasoning [00:27:27]. An agent is expected to output a tool and a justification (reasoning snippet) for its use [00:52:10]. Agents can chain tool calls indefinitely, with optional limits set via the API [00:44:01]. Unlike some frameworks where agents must declare “I’m done,” Leta agents must explicitly say “I want to keep going” (via heartbeat requests), making derailment less likely [00:44:27].
Leta aggressively manages the message buffer [00:42:38]. Developers can artificially cap the context window length (e.g., to 10k tokens) [00:25:14]. If the context exceeds the limit, messages are evicted to recall memory (making them searchable) and a configurable summarizer runs (e.g., truncation or recursive summary) [00:54:40].
For building AI agents, Leta provides:
- Default Tools: MEGPT agents have built-in memory management tools for appending, replacing, and searching core memory, as well as inserting into external databases [00:31:30].
- Custom Tools: Python tools can be written and attached to agents [00:49:23]. Agents can even import the Leta client within a tool, allowing an agent to create or manage other agents and their memory [00:49:40].
- Sandboxing: Tools run within a sandbox (e.g., E2B) by default to prevent interference between agents, especially in multi-tenant environments [01:00:16].
- Tool Limits: Generally, performance can degrade with more than 12-15 tools, as agents may get confused [00:35:04]. A potential solution is a “split-thread agent” where memory tools are managed by a separate, subconscious agent [00:35:11].
- Tool Testing: The Leta UI allows testing tools independently of the agent, rather than relying on the agent to invoke them [00:59:45].
Multiagent systems in AI with Stateful Agents
Unlike traditional multiagent frameworks where agents are often trapped in a single Python file and run synchronously (e.g., Autogen) [01:01:21], stateful agents in Leta exist independently as services backed by APIs [01:01:12] [01:01:28].
This paradigm enables:
- Asynchronous Communication: Agents can communicate over API channels, much like humans using messaging apps; sending a message doesn’t freeze their execution [01:01:45].
- Synchronous Communication: For scenarios requiring immediate response (e.g., supervisor-worker interactions), agents can pause execution until a reply is received [01:03:49].
- Shared State: Because memory blocks live in a database, information can be shared and updated across multiple agents instantaneously [00:32:18].
- Dynamic Groups: Agents can be grouped by tags, allowing messages to be sent to all agents matching a tag (e.g., for map-reduce patterns) [01:04:29].
- Modular Agents: Stateful agents can be “taken out” of one multiagent group and “attached” to another, retaining their learned experience and skills [01:02:11].
The ability to easily connect agents via APIs allows for complex, human-like, or machine-optimized communication patterns [01:02:27].
Practical Considerations for Building Effective AI Agents
- UI/DX: The Leta UI provides a visual environment for creating and interacting with agents, including a “context simulator” to visualize the full payload sent to the LLM [00:58:48]. This is crucial for debugging and understanding agent behavior, especially when working with stateful agents [00:59:08].
- Prompt Engineering: In-context memory tuning (e.g., system prompts, personas) is the primary way to change agent behavior [00:43:05].
Use Cases and Future Outlook
Scaling AI agents in production benefits greatly from statefulness. Current applications include:
- Verticalized Agents: Specialized agents for specific domains that require retaining information [01:12:55].
- Enterprise Deployments: Advanced multiagent systems processing large volumes of transactions and learning about users, often without direct chatbot interaction [01:13:17].
Unsolved problems in memory management and delegation in AI include:
- Forgetting: While archival memory is timestamped for consolidation, an automatic forgetting mechanism is still an area of research [01:14:57].
- Ingesting Large Data: Compressing vast amounts of temporal data into a single memory block requires the agent to chain function calls and recursively regenerate the memory block over time [01:15:30].
- Active Document Editing: Agents are generally better at writing entire documents from scratch rather than performing line-by-line diffs, making collaborative human-agent document editing a complex challenge [01:17:14].
- Coding Agents vs. Tool Calls: The trade-off between the perceived higher performance of coding agents (which execute code directly) and the complexity of secure execution environments for every tool remains a consideration [01:17:46].