From: aidotengineer

This article explores the role and development of tools within AI frameworks, focusing on their integration with Large Language Models (LLMs) and the emerging paradigm of stateful agents.

Workshop Setup: Leta Framework [00:00:10]

The Leta framework workshop includes an interactive component that requires users to install Docker and pull a specific Docker image to run the server component [00:00:21]. A Jupyter notebook serves as the client, interacting with the Docker-hosted server [00:00:43]. The Leta server is an open-source stack built with FastAPI, PostgreSQL, and Python logic [00:17:27]. Its API allows interaction with agents [00:17:46], similar to the chat completions API, but session-based, eliminating the need to provide full conversation history with each interaction [00:18:01].

Core Concepts in Leta [00:22:18]

Leta distinguishes itself from other frameworks by being a server-client process, allowing agents to be stateful and persist indefinitely [00:22:06]. This centralized server acts as a single source of truth for agent states [00:22:18].

Key components of memory in a Leta agent include:

  • Memory Blocks: These are strings with references that the agent can edit. For example, an agent can rewrite its own “persona” or update information about a user [00:23:25].
  • System Prompt: While read-only for the agent, users can configure the system prompt to guide agent behavior and interaction style [00:24:34].
  • Tools: Agents are equipped with tools to manage their memory and perform actions [00:31:24].

Stateful Agents and Memory Management [00:01:44]

The workshop redefines the concept of an “agent” as a stateful agent [00:01:44]. While a common definition of an agent is an LM taking actions in a loop [00:02:24], this definition often misses the crucial aspect of the agent being updated within a closed loop [00:02:40]. LLMs, built on stateless transformers [00:02:50], necessitate a mechanism for state updates when the loop is closed [00:02:56]. The distinction between stateless and stateful agents is vital because LLMs inherently lack memory beyond their weights and current context window [00:03:43].

For agents to deliver on their promise, solving the problem of “stapleness” or memory is paramount [00:03:23]. Unlike humans who form new memories and learn over time [00:03:56], learning in LLMs must be managed by the user or the framework [00:04:02]. Traditionally, state is handled by simply appending to a list [00:04:10], which becomes problematic when building useful, long-running agents [00:04:23].

The concept of a “Memory Management System for LLMs” (LMOS) is introduced [00:05:22]. The core idea is that if LLMs are to improve, the memory management should ideally be performed by another LLM [00:05:34]. This contrasts with the typical approach where context windows are loosely defined and held in process memory [00:06:00]. A memory management system implies an optimal way to arrange the context window, drawing from a potentially very large state outside the immediate context [00:06:38]. This automated context compilation aims to prevent issues like chat derailment, a common problem when using tools like ChatGPT or Claude for extended periods [00:07:09].

The primary reason for desiring stateful agents is the inability of current agents to truly learn from experience [00:07:31]. While less noticeable in workflow-based applications, this limitation becomes evident when building assistants or companions [00:07:44]. Stateful agents are crucial for learning from vast amounts of data, especially in enterprise settings where data often exceeds context window limits [00:09:13]. This introduces a “post-training” phase where the model learns about a specific company or user in-context memory, not by updating its weights [00:09:31].

Memory Tiers [00:29:53]

Leta employs two tiers of memory to mimic human memory:

  1. Core Memory: This is top-level information directly within the context window, akin to immediate recall about a person [00:29:56].
  2. Archival and Recall Memory: These are data sources that exist outside the context window. The agent can “jog its memory” to retrieve information from these sources, similar to a human searching through photos for a past event [00:30:33]. This is analogous to agentic RAG [00:30:53].

The key distinction between archival and recall memory:

  • Recall Memory (conversation history): Automatically written upon event, but cannot be manually written to [00:33:47]. It’s designed to mimic a conversation search function [00:34:15].
  • Archival Memory: A general read/write data store of infinite size, like a vector database of strings [00:33:25]. It’s for actively storing large documents or data outside the context window [00:34:01].

Memory blocks are backed by an API, allowing direct modification and sharing among multiple agents in a multi-agent system [00:32:04].

Tools and Function Calls [01:12:13]

Tool calling is central to Leta’s design for managing context [01:12:12]. LLMs are continually improving at tool calling [01:12:16].

Agent Interactions and Tool Execution [00:51:59]

In Leta, every LLM invocation is treated as a tool call [00:51:29]. Even simple responses require calling a “send message” tool [00:51:39]. This means tools are always “on” [00:51:53], allowing agents to run in the background without constantly speaking to the user [00:51:44].

When a message is sent, a payload is created including the system prompt, memories, messages, and tool schemas [00:52:00]. The agent is required to output a tool call and a reasoning snippet (justification) [00:52:10]. Leta supports native reasoning from models like DeepSeek’s R1 API or injects “think tokens” for models without native reasoning [00:52:17].

Agents can chain tool calls indefinitely, with limits settable via API [00:44:01]. This chaining is enabled by “heartbeat requests,” where the agent explicitly signals its desire to continue execution [00:44:41]. This is an inversion of the traditional React pattern, where agents typically loop until they explicitly state they are “done” [00:44:27].

Types of Tools [00:31:32]

Leta agents come with built-in memory management tools:

  • Core Memory Tools:
    • append: Add information to memory blocks (e.g., “the user also has a boyfriend called James”) [00:31:34].
    • replace: Update existing information (e.g., “the user’s name is actually Charles”) [00:31:40].
    • search: Perform specific conversation searches or generic RAG queries on external data sources [00:31:45].
  • Archival Memory Tool: insert: Store data into the external database [00:31:51]. This is typically a vector database [00:44:54].

Custom Tool Development [00:49:23]

Custom tools can be written in Python and deployed on the Leta backend [00:49:25]. A key feature is the ability to import the Leta client directly within a tool [00:49:40], allowing agents to manage other agents’ memory or even create new agents [00:49:46]. Tools run inside a sandbox by default, supporting E2B keys for secure execution in private clouds or multi-tenant services [01:00:16].

All Composio tools are also baked into Leta by default, allowing integration with popular services like BigQuery or Google Calendar if a Composio API key is provided [01:00:42].

Challenges in AI Development with Tools [01:12:13]

  • Tool Confusion: Agents can become confused if too many tools are added, leading to performance degradation [00:35:04]. A potential solution is a dedicated “shadow agent” or “subconscious” that handles memory management tools, separate from the main agent’s general API actions [00:35:11].
  • “You Can’t Know What You Don’t Know”: If information exists outside the LLM’s context window, the agent won’t know it has access to it. This can be mitigated by providing metadata statistics (e.g., number of previous messages or total archival memories) within the context [00:42:01].
  • Context Window Limits: Leta allows artificial capping of context window length (e.g., 10k tokens) to manage costs and latency [00:25:14]. If the limit is approached or exceeded, Leta automatically evicts messages into recall memory via a summarizer (configurable as truncation or recursive summary), ensuring the context never overflows [00:54:36].
  • Debugging: Visualizing the full payload sent to the LLM can be challenging in many frameworks [00:59:04]. Leta’s “context simulator” provides clear visibility into what’s in the context window at any given point [00:58:48].
  • Testing Tools: It can be difficult to test if a tool is working well without getting the agent to run it. Leta allows running tools separately from the agent for easier testing [00:59:46].

Multi-Agent Systems with Tools [01:01:09]

The concept of multi-agent systems is explored, contrasting with traditional frameworks like Autogen where agents are often trapped within a Python file and lack true independence [01:02:20]. Leta’s approach emphasizes independent, stateful agents running on servers and accessible via APIs [01:02:17]. This allows for asynchronous message passing between agents, similar to human communication over channels like Slack [01:02:40].

Leta provides multi-agent tools to facilitate communication patterns:

  • Asynchronous Message Passing: Agents can send messages to each other and immediately get a receipt without pausing their own execution [01:03:09]. This mimics human interaction where one doesn’t freeze while waiting for a reply [01:03:17].
  • Synchronous Message Passing: Agents can send a message and wait for a reply, which is useful when a supervisor agent needs to freeze its execution until it receives a response (e.g., asking for help) [01:03:49].
  • Group Messaging: Agents can send messages to all agents matching specific tags, enabling supervisor-worker concepts or parallelized tasks [01:04:29].

Because agents are API-accessible services, they can be dynamically managed (e.g., removing a tool to prevent an agent from replying) [01:11:30].

User Interface and Experience [01:22:25]

The Leta framework also offers a UI builder (AD) for faster iteration in a low-code environment [00:56:07]. This interface visualizes agent interactions, memory blocks, and tool usage, offering an alternative to purely programmatic SDK interactions [00:57:18]. The AD environment can be used to set up basic agent attributes, adjust context window sizes, and explore the context simulator for payload transparency [00:58:48]. This represents a potential new iteration of the playground experience, moving beyond the current ubiquitous stateless chat interfaces [01:22:25].

Conclusion [01:12:21]

The focus on stateful agents and robust tool integration, particularly with an “agents as services” paradigm, addresses current deficiencies in LLM agents’ ability to learn and retain memory, moving towards more human-like AI behavior [01:10:07]. This approach is highly relevant for building verticalized agents and complex enterprise workflows where continuous learning and persistent state are critical [01:13:00].