From: aidotengineer
This article explores the concept of AI agents, focusing on the critical need for “stateful agents” and how API-based communication enables effective multiagent systems in AI. The content is derived from a workshop that introduced foundational ideas for building effective agents, particularly those with robust memory capabilities.
The Challenge of Stateless LLMs in AI Agents
While a common definition for an AI agent is “an LM that’s taking actions in a loop” [00:02:26], this definition misses a crucial component: the agent must be updated within that closed loop [00:02:38]. The fundamental unit of compute in the current wave of AI, the transformer, is inherently stateless [00:02:50]. This means a mechanism is required for updating the agent’s state when the loop closes [00:02:56]. This distinction between stateless and stateful agents has become particularly important because Large Language Models (LLMs) are stateless [00:03:12].
Why Stateful Agents?
Stapleness, or statefulness, is synonymous with memory in the context of LLMs [00:03:38]. LLMs effectively have no memory beyond their weights and context window [00:03:43]. Unlike humans who form new memories and learn over time [00:03:56], LLMs do not. Any learning must be handled by the user or the agent framework [00:04:02].
Traditionally, state handling in AI agents has been limited to appending to a list [00:04:10]. This approach is insufficient for AI agents beyond chatbots that perform useful, complex tasks [00:04:20]. The inability to learn from experience is a major limitation of current agents [00:07:31]. For example, an agent that fails to update its memory after being informed of a user’s breakup can lead to “devastating error[s]” in consumer applications [00:08:15].
Stateful agents promise:
- Learning from experience: Essential for assistants, companions, and copilots [00:07:44].
- Handling large enterprise data: Enterprises often have more data than can fit into an LLM’s context window [00:09:14]. Stateful agents enable learning from this data, effectively like a post-training phase that updates in-context memory [00:09:39].
- Enhanced user experience: Eliminating conversational derailment where users have to repeatedly re-describe context to the LLM [00:10:47]. The experience should get progressively better as the AI learns more about the user [00:10:57].
- Human-like behavior: Building effective agents with human-like memory constructs (fuzzy memory, forgetfulness, recall) leads to more natural agent behavior [01:11:17].
LMOS: A Memory Management System for LLMs
To address the statelessness of LLMs, a “memory management system for LLMs” (LMOS) is proposed, based on the MEGPD paper [00:05:22]. The core idea is that if LLMs need memory management, and they are becoming increasingly capable, then this management should be performed by another LLM – the AI managing the memory for the AI [00:05:32].
In this system, an LLM is made aware of its context problem and is given tools to manage memory [01:12:00]. This approach is centered around tool calling, as LLMs are continually improving at this task [01:12:11].
Memory Tiers in Stateful Agents
The Leta framework implements a multi-tiered memory system:
- Core Memory: Top-level, in-context memory, akin to immediate human recall (e.g., a friend’s name, hobbies) [00:29:56]. These are strings with references, editable by the agent itself [00:23:25].
- Archival/Recall Memory: Data sources that exist outside the context window, similar to long-term memory that requires conscious effort to retrieve (e.g., searching old photos) [00:30:35].
- Recall Memory is effectively conversation history that is write-protected but automatically updated with events [00:33:47].
- Archival Memory is a general read/write data store, akin to an infinite-sized vector database of strings, where agents can actively store and retrieve arbitrary data, such as large documents [00:33:59].
Agents can “jog their memory” by using tools to search these external data sources [00:30:40], a concept similar to agentic workflows based on Retrieval Augmented Generation (RAG) [00:52:53].
Context Window Management
The system actively manages the context window, allowing developers to artificially cap its length (e.g., to 4K tokens) [00:54:14]. If the context exceeds this limit, messages are evicted into recall memory and a summarizer runs to keep the in-context payload within bounds [00:54:40]. This prevents payloads from growing excessively large, which can lead to high costs and slow response times [00:58:00].
Tooling and Agent Behavior
Integration of tool calling in agent frameworks is central. Agents are given tools to manage their memory, including:
- Appending to memory blocks [00:31:34].
- Replacing content in memory blocks [00:31:39].
- Searching memory (specific conversations or general RAG queries) [00:31:45].
- Inserting into external databases [00:31:51].
The framework supports developing AI agents and agentic workflows by allowing tool execution on the server side [00:29:17]. Tools can be sandboxed, and agents can be extremely “meta,” importing the client itself to create or manage other agents and their memory [00:49:40].
Agents in this paradigm generally follow a “React-style” pattern of reasoning, action, and observation [00:44:11]. Unlike traditional React agents that loop until they explicitly state they are done, Leta agents must explicitly state they “want to keep going” (via “heartbeat requests”), which is considered more practical for preventing derailment [00:44:29].
API-Based Communication for Multi-Agent Systems
A key benefit of stateful agents running on servers and being accessible via APIs is the simplicity of multi-agent collaborations and communication [01:02:27]. Unlike many existing multiagent systems in AI (e.g., Autogen) where agents are often trapped within a single Python file and don’t exist independently [01:02:21], API-backed stateful agents can communicate via simple message passing [01:02:28].
This mirrors human interaction in a remote company, where individuals communicate asynchronously but maintain their own state and experience [01:01:40]. The lack of stapleness in traditional multi-agent frameworks means losing the benefit of taking an “expert” agent from one group and placing it into another [01:02:05].
Multi-agent communication patterns include:
- Asynchronous Messaging: An agent sends a message and receives a receipt without pausing its own execution, similar to a human sending an instant message [01:03:09].
- Synchronous Messaging: An agent sends a message and waits for a reply, beneficial when the agent needs to freeze execution (e.g., waiting for supervisor input) [01:04:00].
- Group Communication: Agents can be grouped by tags, allowing messages to be sent to all agents matching specific criteria, supporting “supervisor-worker” concepts and parallelized tasks [01:04:29].
These multitagent collaborations and communication tools are implemented by having agents import the client and send messages to other agents via API calls [01:02:51]. The ability to remove or add tools dynamically also allows for runtime control over agent communication permissions [01:11:37].
Conclusion and Future Outlook
The future of mCP in Agentbased Systems (Memory, Compute, and Persistence) strongly points towards stateful agents and API-based communication. This enables the development of human-like AI agents capable of learning from experience and adapting to diverse enterprise and consumer use cases [01:13:00]. This general platform supports deploying any stateful, LLM-based service [01:13:40].
While challenges remain (e.g., optimizing tool performance, human-agent collaboration on active documents, automated memory forgetting), the foundational shift towards stateful agents and robust API-driven multiagent orchestration in AI copilot systems provides a powerful paradigm for the next generation of AI applications.