Agent continuations for AI workflows

From: aidotengineer

Agent continuations represent a novel mechanism developed at Snaplogic for managing agent state and facilitating human-in-the-loop processing within AI workflows [00:34:34]. This approach allows for the capture of the full state of complex agents, enabling arbitrary human and loop processing, and providing a basis for reliable agent continuation through snapshots [00:50:07].

Challenges with Modern AI Agents

As agents move into production settings, several challenges arise [00:08:00]:

Human Oversight and Approval Agents often require human approval during their processing steps, especially for high-value or high-risk tasks such as transferring money or deleting accounts [00:12:00]. Key aspects of agent execution need some amount of human oversight to ensure comfort with agent automation [00:29:00].
Long-Running Agents and Failure Tolerance Many agents are designed to be long-running, involving numerous steps [00:12:00]. The longer a process runs, the greater the chance of failure [00:19:00]. A mechanism is needed to prevent the loss of work and allow agents to checkpoint their state for resumption from a specific point rather than restarting from the beginning [00:24:00].
Distributed Environments Increasingly, agents operate in distributed environments beyond a single desktop [00:40:00]. Considerations for running agents scalably in such environments are essential [00:48:00].
Multi-level Agents Sophisticated agent configurations often involve a main “orchestrator” agent with multiple sub-agents, which themselves can have sub-agents [00:06:00]. Addressing human approval and state saving in the presence of these complex multi-level agents is a significant challenge [00:30:00].
Agent Loop Persistence Most current frameworks require the agent loop to run continuously, even when waiting for human input [00:23:00]. This requires continuous physical machine resources [00:07:03]. A solution is needed to allow the agent loop to be fully shut down and restarted later [00:56:00].

Basic Agent Execution

An agent typically operates as a loop involving calls to a Large Language Model (LLM) that specify potential tools [03:01:00]. If the LLM decides to use a tool, it returns to the agent loop, which then calls the tool, collects results, and sends them back to the LLM in a progressive cycle [03:10:00]. A tool can also be an agent itself, forming a sub-agent relationship [03:31:00]. Even simple agents involve significant interaction with LLMs and tools [03:53:00].

Agent Continuations Explained

Inspired by the programming language concept of continuations, agent continuations enable pausing agent execution, saving its state, and then resuming or continuing execution from that point at a later time [08:11:00]. This allows for a snapshot of the agent’s execution [08:49:00].

Key Insight: Leveraging the Messages Array

A core insight behind agent continuations is that interactions with LLMs in agents already maintain a “messages array,” which serves as a log of all interactions [10:04:00]. This history is replayed back to the LLM for its next inference [10:24:00]. While not entirely sufficient, this array already captures much of the agent’s state [11:00:00].

Implementation and Usage

To use agent continuations, a continuation agent class is utilized instead of a standard agent class [12:50:00]. Tools can be designated as needing approval [12:37:00].

When an agent needs to suspend (e.g., for human approval or another condition), it creates a “continuation object” [13:16:00].

This object:

Embeds the standard messages array.
Includes additional metadata, such as a resume request (indicating where to resume) and processed (to be populated with approval status) [15:21:00].
Is designed to be recursive, supporting arbitrary layers of nesting for sub-agents [18:20:20].

The application layer inspects this object, provides necessary updates (like human approval), and then sends the continuation object back to the agent [16:26:00]. The agent then reconstructs its state and resumes processing from the suspension point [16:34:00].

A significant benefit is that once the continuation object is created, the agent loops do not need to keep running; they can be shut down, as enough information has been captured to restart everything [16:54:00].

Example Scenario: Multi-level HR Agent

Consider a multi-level HR agent where a top-level HR agent uses an email tool and an account agent sub-tool [18:50:00]. The account agent itself has tools like “create account” and “authorize account,” with “authorize account” requiring human approval [19:17:00].

When the account agent reaches the “authorize account” tool, it suspends execution, creates a continuation object, and propagates it back to the application layer [20:01:00]. The application layer processes the approval, updates the continuation object, and sends it back to the HR agent. The framework then restores the state of both the main agent and sub-agent, allowing processing to continue [21:20:00].

Prototype and Future Directions

A prototype implementation of agent continuations has been built on top of the OpenAI Python API, with no other dependencies [24:12:00].

Further Development:

General Agent Suspension: Implementation of more general agent suspension beyond just human approval, allowing for arbitrary suspension points based on time, turns, or asynchronous requests [24:27:00].
Integration with Existing Frameworks: The focus is on extending existing agent frameworks like Strands or Pydantic AI, rather than developing a separate framework [24:50:00].

While other frameworks have considered state management, this approach is novel in combining both a robust human approval mechanism and support for arbitrary nesting of complex agents [25:03:00].

This work originated from the Agent Creator research group at Snaplogic, where the concept was prototyped both at the Python layer and within Snaplogic’s visual agent building interface and platform, Agent Creator [25:48:00].

Tubegraph

Explorer

Table of Contents