Letta Framework and MemGPT

From: aidotengineer

This article provides an overview of the Letta Framework and the concepts behind MemGPT, focusing on the development of stateful agents.

Introduction to Stateful Agents and Memory

The core idea of stateful agents addresses a significant challenge in modern AI: the inherent statelessness of large language models (LLMs). While traditional agents (from the era before LLMs) were generally understood to be stateful, the current wave of AI relies on transformers, which are fundamentally stateless machines [02:50:06]. This means that when an LLM operates in a closed loop (like an agent taking actions), there must be an external mechanism to update its state [02:56:11].

The speaker suggests “stateful agents” as a more precise term for what agents truly meant before the LLM era [02:17:09]. A common, though incomplete, definition of an agent today is an LLM taking actions in a loop [02:26:08]. However, this definition often overlooks the crucial aspect of the agent being updated within that closed loop [02:38:09].

Why Stateful Agents Matter

The primary reason for pursuing stateful agents is to enable learning from experience, a capability that current LLM-driven agents lack or have in a very limited way [07:31:09]. This is especially critical for applications like assistants, companions, and co-pilots [07:44:07]. Without state, agents can make obvious mistakes, such as forgetting crucial information like a user’s relationship status, which a human would never do [08:18:04]. Such errors are devastating for consumer applications [08:48:07].

Statefulness, or memory, is considered perhaps the most important problem to solve for agents to deliver on their hype [03:22:04]. It is synonymous with memory in the context of LLMs, as LLMs only have memory in their weights and context window [03:39:27]. Humans are stateful, forming new memories and learning over time, while LLMs do not [03:57:07]. Any learning with LLMs must be done by the user or the framework [04:02:16]. Traditionally, this meant simply appending to a list, which is problematic for complex, useful agent applications [04:10:04].

For enterprise settings, stateful agents are vital because companies possess far more data than can fit into an LLM’s context window (e.g., 10 million tokens) [09:13:00]. Stateful agents enable a “training phase” where the model learns about the company after deployment, not by changing its weights, but by leveraging in-context memory [09:34:00].

A common user experience, particularly with ChatGPT, involves conversations derailing when they become too long, requiring the user to “mentally context compile” and re-describe everything [10:24:02]. The promise of stateful agents is to eliminate this derailment, making the AI experience consistently improve over time as the AI learns more about the user [10:50:09]. By creating human-like memory constructs, agents can exhibit more human-like behavior, including fuzzy memory and recall [11:14:04].

MemGPT: A Memory Management System for LLMs

The concept of MemGPT originated from a paper focused on a memory management system for LLMs [05:13:04]. The core idea is that if LLMs are stateless and require memory management, and if LLMs are continuously improving, then an AI should manage memory for the AI itself [05:32:00].

Context Management in MemGPT

MemGPT proposes an “LMOS” (LLM Operating System) approach, which contrasts with typical LLM usage:

Traditional (Stateless): A loosely defined context window, not tied to a database, held in process memory, with new information appended over time [05:55:07]. This often necessitates tracing software and observability to understand the “black box of tokens” shoved into the LLM [06:21:05].
MemGPT (Stateful): Recognizes that there’s an optimal way to arrange an LLM’s context window, with context coming from a potentially very large state (much larger than the context window itself) [06:29:08]. This system aims to automate the “context compilation” that power users of ChatGPT or Claude currently perform manually [06:51:01].

The Letta Framework

Letta is an open-source framework designed to implement stateful agents and memory management, building on the ideas of MemGPT [05:00:05] [13:48:02]. It utilizes a client-server architecture (FastAPI, Postgres, and Python logic) to ensure agents persist indefinitely [17:30:08] [22:06:05].

Core Components and Architecture

Client-Server Model: Agents are created on the server and are persistent. The client interacts with the agent via a REST API, sending individual messages rather than the entire conversation history [22:06:05] [23:09:05].
Memory Blocks: The main components of memory in a Letta agent are memory blocks, which are essentially strings with references [23:24:00]. These blocks are stored in a database (Postgres) and can be edited by the agent itself [23:44:05] [32:02:00].
- Core Memory: High-level, top-level information that is always in the agent’s context window, mimicking immediate human recall (e.g., a friend’s name, hobbies) [29:55:07]. Core memory can be updated (append/replace) [31:32:08]. It has configurable limits, and if a limit is hit, the agent might receive an error suggesting eviction to archival memory [48:18:04].
- Archival Memory: A general read/write data store that exists outside the context window, effectively a vector database of strings [33:10:09]. The agent can “jog its memory” by searching this database using a tool [30:37:07]. This is analogous to “agentic RAG” (Retrieval Augmented Generation) [30:52:09]. It is not automatically written to; data must be actively inserted [34:01:03].
- Recall Memory: Specifically designed for conversation history, write-protected, and automatically updated every time an event happens [33:47:04]. It mimics a conversation search function [34:15:08].
- Shared Memory Blocks: Memory blocks can be shared among multiple agents, allowing them to share information and automatically update when one agent writes to a shared block [32:12:00].
System Prompt and Tools: The agent’s behavior is influenced by its system prompt and the tools it has access to [29:49:03]. Letta agents are forced to follow a React-style pattern (Reasoning, Action, Observation) [27:27:03].
- Tool Calling: Every LLM invocation in Letta is a tool call [51:29:00]. Agents must explicitly call tools, even to send messages [51:39:00]. This enables agents to run frequently in the background without always needing to output a message to the user [51:44:00].
- Tool Execution: Tools are executed on the server side and are sandboxed by default, supporting environments like E2B [29:17:00] [01:00:16].
- Tool Chaining: Agents can chain multiple tool calls together, and they continue looping until they explicitly decide to stop (“I want to keep going” vs. “I’m done”) [43:45:00] [44:39:00].
- Custom Tools: Developers can write custom Python tools that can even import the Letta client, enabling agents to create or manage other agents and their memory [49:25:00].
Context Window Management: Letta automatically manages the context window by capping its length [25:11:00]. If the context exceeds the limit, messages are evicted into recall memory and a summarizer runs (configurable as truncation or recursive summary) [54:40:00]. This ensures agents never send payloads over a set limit, saving cost and time [58:00:00].
Metadata Statistics: To address the “you can’t know what you don’t know” problem, Letta can provide agents with metadata statistics about information outside their context window (e.g., number of previous messages, total archival memories) [41:54:00].

Multi-Agent Systems in Letta

Letta’s approach to multi-agent systems differs from frameworks like Autogen by emphasizing stateful, independently existing agents [01:02:00].

Asynchronous and Persistent Agents: Unlike agents trapped in a Python file, Letta agents run asynchronously and persist indefinitely on servers [01:01:29]. This allows agents to be “taken out” of one group and attached to another, carrying their experience and memories [01:02:11].
API-driven Message Passing: Multi-agent communication is facilitated through message passing via APIs, similar to humans communicating over Slack [01:02:24].
Multi-Agent Tools: Letta provides built-in tools for multi-agent communication, including:
- Asynchronous Messaging: An agent sends a message and immediately receives a receipt, without pausing its own execution, akin to iMessage [01:03:09].
- Synchronous Messaging: An agent sends a message and waits for a reply, beneficial when a supervisor or critical response is needed [01:03:50].
- Group Messaging: Agents can send messages to all agents matching specific tags, enabling supervisor-worker or map-reduce patterns [01:04:10].
Dynamic Tool Management: Agents can have their communication tools dynamically added or removed, effectively controlling their ability to interact with other agents [01:11:30].

Practical Implementation and Workflow

Letta provides a workshop experience to demonstrate its capabilities:

Setup: Requires Docker for the server, which is an open-source stack (FastAPI, Postgres, Python) exposing a robust API [21:00:00] [17:27:00]. The client interacts via Python SDK or REST API [17:48:00].
Jupyter Notebooks: Used to lay out basic ideas behind the context management system, focusing on an LLM being aware of its context window and managing memory with tools [11:42:00].
Web UI (AD): An interactive application (app.leta.com) that serves as a “new iteration of the playground” for stateful agents [01:12:45]. It allows for faster iteration in a low-code environment compared to SDKs [56:07:00].
- Context Simulator: A feature within the UI that allows developers to visualize the full payload being sent to the LLM, including system instructions, tool descriptions, external summary metadata, and messages [58:48:00]. This helps in understanding and debugging what the agent “sees” in its context window [59:08:00].
- Tool Testing: The UI enables testing tools separate from the agent, which is challenging in a notebook environment [59:45:00].
Prompt Engineering: Tuning in-context memory is the primary way to change agent behavior [43:04:00].
Finetuning Language Models with MCP: While not explicitly detailed as “MCP,” the framework aims for agents to “learn” about companies by processing data into their in-context memory, suggesting a form of continual learning post-training [09:34:00].

Challenges and Considerations

Tool Overload: Agents can become confused if too many tools are added (e.g., more than 12-15 tools can degrade performance) [35:04:00] [53:02:00]. A solution involves having a dedicated “shadow agent” or “subconscious” that handles memory tools, keeping the main agent’s toolset lean [35:09:00].
“Doesn’t Know What It Doesn’t Know”: If information is only in archival memory, the agent might not know to search for it without explicit prompting or pre-defined tool rules [42:01:00] [46:40:00].
External Database Integration: Currently, Letta primarily supports Postgres due to schema changes and migration scripts, though SQL light is technically supported for local use [01:14:14].
Forgetting: While archival memory is tagged with timestamps, there’s no inherent “forgetting” mechanism; consolidation needs to be actively managed by the agent [01:14:57].
Secure Execution Environment: Running tools in a sandbox (like E2B) is crucial for security, especially in multi-tenant environments, but can introduce latency [01:18:56].
Active Document Editing: LLMs are often better at rewriting entire documents than making precise line-by-line diffs, posing a challenge for collaborative or iterative document agents [01:17:14].

Use Cases

Letta is designed for deploying stateful, LLM-based services [01:13:40].

Verticalized Agents: Companies building specialized agents for specific domains require memory and state [01:12:55].
Enterprise Deployments: Advanced multi-agent systems can run stable workflows, processing transactions and learning about users without direct messages (i.e., not chatbots) [01:19:16].

Tubegraph

Explorer

Table of Contents