Memory Management in AI

From: aidotengineer

Memory management is a critical aspect of artificial intelligence, particularly with the widespread adoption of large language models (LLMs). The term “agent memory” or “LMOS” (LLM Operating System) is often used to describe systems that enable AI agents to retain and utilize information over time, much like humans do [01:39:54].

Stateless vs. Stateful Agents

Traditionally, an AI agent could be defined as an LLM taking actions in a loop [02:26:00]. However, this definition often overlooks a crucial component: the updating of the agent’s state within a closed loop [02:38:00]. Modern LLMs, such as transformers, are inherently stateless machines [02:52:00]. This means that after each interaction, a mechanism is required to update their state [02:58:00].

Prior to the LLM era, the distinction between stateful and non-stateful agents was less emphasized [03:08:00]. With the current reliance on stateless LLMs for AI, it has become a critical differentiation [03:12:00]. Memory, or “stapleness,” is considered paramount for agents to deliver on their potential [03:23:00].

Limitations of Current AI Implementation

Current methods for handling state in LLM-driven AI often involve simply appending to a list held in process memory [04:10:00]. While this sufficed for early workflows and experimental use cases, it becomes a significant problem when agents are used for practical, useful applications [04:22:00].

A major challenge is that current agents cannot effectively learn from experience, or their learning is extremely limited [07:31:00]. This becomes particularly evident in applications like chatbots, assistants, companions, and co-pilots [07:44:00]. For example, an AI might repeatedly ask about an ex-partner if it lacks a robust memory management system to update relationship status [08:15:00]. Such errors are “devastating” for consumer applications [08:48:00].

For power users of LLMs, context compilation is often a manual process [06:51:00]. Conversations that go on for too long can “derail,” forcing users to re-describe everything to the LLM, a “painful experience” [07:04:00].

The MEGPT Paper and LMOS

The concept of a “Memory Management System for LLMs” or “LMOS” proposes that if LLMs are to continuously improve, the memory management should be handled by another LLM, rather than a human [05:32:00]. The MEGPT paper (Multi-Agent GPT) explored this idea, suggesting that AI should perform memory management for AI [05:35:00].

Stateful AI Agents with Leta

The Leta framework is designed around the concept of stateful AI Agents that persist indefinitely, using a server-client process where the server acts as a centralized source of truth [02:08:08]. Leta aims to automate the compilation of context for LLMs [07:09:00].

Key Concepts in Leta’s Memory Management:

Memory Blocks: The fundamental units of memory in a Leta agent are “memory blocks,” which are essentially strings with references [02:27:00]. These blocks are stored in a database (e.g., Postgres) [03:04:00].
Three Tiers of Memory:
- Core Memory: High-level, frequently accessed information that is always in the context window [02:56:00]. This is analogous to immediately recalling a friend’s name and hobbies upon seeing them [03:00:00]. Agents can edit their own core memory, rewriting information based on new experiences [02:49:00].
- Archival Memory: A general read/write data store outside the context window, similar to a vector database [03:25:00]. Agents can “jog their memory” by searching this database [03:40:00]. This is useful for large documents or structured data [03:06:00].
- Recall Memory: Specifically designed for prior messages and conversation history, also residing outside the context window [03:37:00]. It’s automatically written to upon events and provides a conversation search function [03:52:00].
Context Window Management: Leta allows artificial capping of the context window length (e.g., 4K tokens) [05:14:00]. If the limit is reached, messages are evicted to recall memory, and a configurable summarizer runs to maintain the context within limits [05:41:00]. This prevents excessive payload sizes, which can lead to slow responses and high costs [05:25:00].
Tool Calling: All LLM invocations in Leta are tool calls, even for simple messages [05:29:00]. Agents are required to output a tool and a justification (reasoning snippet) [05:10:00]. Leta executes tools on the server side, and tools are sandboxed by default, supporting MicroVMs [02:24:00].
- Tool Chaining: Agents can chain tool calls indefinitely, allowing for complex multi-step operations [04:45:00]. This is managed by “heartbeat requests,” where the agent explicitly requests to continue execution [04:40:00].
- Custom Tools: Developers can write custom tools in Python, allowing agents to perform arbitrary actions, including creating other agents or managing their memory [04:52:00].
Context Simulator: A feature in the UI allows developers to visualize the full payload being sent to the LLM, including system instructions, tool descriptions, external summary metadata, and messages [05:48:00]. This helps with debugging and understanding context compilation.

Benefits of Stateful Agents

Stateful agents offer several advantages:

Learning from Experience: They can form new memories and learn over time, addressing a fundamental deficiency of stateless LLMs [03:57:00].
Human-like Interaction: The behavior of stateful agents becomes more human-like, mimicking fuzzy memory and recall, leading to better user experiences [01:17:17].
Improved User Experience: True statefulness means conversations shouldn’t derail; the AI experience should continuously improve as the agent learns more about the user [10:50:00].
Enterprise Data Handling: For enterprises with vast amounts of data exceeding typical context window limits, stateful agents allow models to “learn” about the company by incrementally storing information in in-context memory, akin to a post-training phase [09:13:00]. This helps avoid common pitfalls in AI strategy.
Persistent and Shareable Memory: Memory blocks live in a database and can be shared among multiple agents within an organization, allowing for collaborative knowledge [03:12:00].

Multi-Agent Systems with Stateful Agents

Traditional multi-agent frameworks often involve agents trapped within a single file, running synchronously and losing their state upon termination [01:02:20]. This contrasts with how humans operate in a multi-agent setting, asynchronously and statefully [01:01:45].

With stateful agents running on servers and accessible via APIs, multi-agent systems become a matter of message passing [01:02:24]. Agents can be “wired” to each other over APIs, similar to how humans communicate using tools like Slack [01:02:38]. This allows for:

Asynchronous Messaging: Agents can send messages and immediately return to their own tasks, receiving receipts, analogous to human iMessage interactions [01:03:32].
Synchronous Messaging: For critical interactions (e.g., contacting a supervisor), an agent’s execution can be frozen until a reply is received [01:03:49].
Group Messaging: Agents can be grouped by tags, allowing a message to be sent to all agents matching specific criteria, useful for supervisor-worker patterns or parallelized tasks [01:04:29].
Flexible System Design: Stateful agents enable dynamic planning for complex tasks and allow for the creation of specialized agents that can be integrated into different multi-agent groups [01:02:11].

Practical Implementation

Leta provides a Docker image for easy setup [00:21:00]. The framework is built on Fast API, Postgres, and Python logic [01:17:32]. It exposes a robust API for interacting with agents [01:17:40]. Developers can use the Python SDK or TypeScript SDK (programmatically generated from the REST API) [01:14:14].

For local development, Docker is recommended due to frequent schema changes that affect database migrations [01:14:32]. The system supports external Postgres databases [01:14:38]. Tools are executed within a sandbox, such as E2B, ensuring secure and isolated environments, especially important for multi-tenant cloud deployments [01:19:02]. While there is a latency trade-off with cold starts, this approach provides a robust solution [01:19:13].

The architecture often leverages graph data structures in AI and its benefits by allowing agents to operate as fully connected graphs initially, with restrictions (tool rules) applied to enforce specific behaviors [04:24:00]. This contrasts with frameworks that start with predefined decision trees.

Leta also integrates with external tool providers like Composio [01:00:42].

The development environment allows for both programmatic interaction via notebooks and a low-code UI builder, which offers a visual representation of agent behavior and memory management [01:18:00]. This visual tool serves as an “iteration” of the traditional playground, aiming to be the standard experience for stateful agents [01:12:51].

Tubegraph

Explorer

Table of Contents