From: aidotengineer
Agent continuations are a novel mechanism developed at Snaplogic to address critical challenges in deploying AI agents to production, specifically human approval workflows and failure resilience for long-running agents [00:00:24] [00:00:50]. This concept allows the full state of complex agents to be captured, enabling both arbitrary human-in-the-loop processing and reliable agent continuation through snapshots [00:00:50].
Challenges in Agent Execution
Before agent continuations, several key challenges existed when moving agents from development to production [00:00:08]:
- Human Approval and Oversight
- A significant concern is integrating human oversight into agent processing to ensure comfort with agent automation [00:01:18].
- High-value or high-risk tasks, such as transferring money or deleting an account, require a mechanism for human intervention to provide final determination or decision [00:01:46] [00:04:20].
- Long-Running Agents and Failure Resilience
- Many agents are designed to be long-running, involving numerous steps [00:02:12] [00:05:14].
- The longer a process runs, the higher the chance of failure (e.g., network or hardware issues) [00:02:19] [00:05:28].
- There is a need to avoid losing all work an agent has done and to be able to checkpoint agent state for resumption from a point other than the beginning [00:02:24] [00:05:42].
- Distributed Environments
- Agents are increasingly operating in distributed environments, not just on a single desktop [00:02:40].
- Considerations for running agents scalably in such environments are crucial [00:02:48].
- Multi-level Agents
- Agents are becoming more sophisticated, often involving a main orchestrator agent with several nested sub-agents [00:06:00].
- Addressing human approval and state saving becomes more complex with these multi-level configurations [00:06:30].
- Agent Loop Persistence
- Most current agent frameworks require the agent loop (the code driving the agent) to run continuously, even when waiting for human input [00:07:23].
- This poses challenges for scalability and resource management [00:07:42].
What are Agent Continuations?
Agent continuations are inspired by the programming language theory concept of “continuations” [00:08:14]. In programming, a continuation allows stopping program execution at any point, bundling its state, and resuming it later from that exact point [00:08:27] [00:08:40]. This is akin to taking a snapshot of the program’s execution [00:08:49].
Agent continuations apply this idea to AI agents [00:09:14]: At any point during agent execution—which may involve multiple tool calls, LLM interactions, and even calls to sub-agents—the agent can be paused, its state saved, and then returned to the application layer for processing, such as awaiting human approval or for persistence [00:09:22] [00:09:34].
Implementation Details
The implementation of agent continuations leverages the existing way LLMs interact with agents [00:09:58].
Messages Array as a Basis
The core insight is that agent interactions with LLMs already maintain a “messages array,” which acts as a log of all past interactions [00:10:04]. This history is replayed back to the LLM for its next inference, effectively saving much of the agent’s state [00:10:24].
The Continuation Object
When an agent needs to suspend (e.g., for human approval), a “continuation object” is created [00:15:12] [00:19:55]. This object embeds:
- The standard messages array [00:15:24].
- Additional metadata, such as a “resume request” indicating the exact tool call or point for resumption, and a “processed” field for updates like human approval [00:17:37] [00:17:50].
For multi-level agents, the continuation object format is recursive, handling arbitrary layers of nesting, allowing sub-agents to also capture their states [00:18:20] [00:18:29].
Workflow with Continuations
- Designation: Tools that require human approval are explicitly designated [00:12:37].
- Suspension: If an agent reaches a designated suspension point (e.g., needing human approval or another condition), it suspends execution [00:13:16] [00:15:12].
- Continuation Object Creation: A continuation object is created, encapsulating the agent’s current state. This object propagates back to the top-level agent, extracting core information for the application layer [00:15:19] [00:16:07] [00:21:03].
- Application Layer Interaction: The application layer receives the continuation object, inspects its metadata (e.g., reason for suspension), and presents it to the user for input or approval [00:13:24] [00:22:52].
- Resumption: Once the application layer updates the continuation object (e.g., with human approval), it sends the object back to the agent [00:14:06] [00:16:26]. The agent framework then uses the logic within the continuation object to reconstruct the agent’s state and resume execution from the point of suspension [00:14:12] [00:16:34].
Benefits of Agent Continuations
- Decoupled Agent Loops: A key benefit is that once a continuation object is created, the agent loops do not need to keep running [00:16:54]. They can be fully shut down, as the continuation object captures enough information to restart everything back to where it was [00:17:02]. This addresses the challenge of agent loop persistence and resource management [00:07:49].
- Seamless Human-in-the-Loop: Enables effective and natural integration of human approval steps for high-stakes actions within complex agent workflows [00:01:26] [00:13:51].
- Enhanced Failure Resilience: Allows for checkpointing of agent state, meaning that if an agent encounters a failure, it can be resumed from the last saved state rather than restarting from the beginning, preventing loss of work [00:02:24] [00:05:42].
- Support for Complex Agent Architectures: The recursive nature of the continuation object natively supports multi-level agents and nested sub-agents, making it applicable to sophisticated multiagent systems [00:18:29].
- General Agent Suspension: Beyond human approval, the mechanism can be extended to support arbitrary suspension points, such as after a certain amount of time, a specific number of turns, or for asynchronous requests [00:24:29].
Prototype and Future Directions
A prototype implementation has been built on top of the OpenAI Python API, with no other dependencies [00:24:12]. The aim is not to develop a new, separate agent framework but to extend existing frameworks like LangChain or PyDantic AI to incorporate this functionality [00:24:50].
While other frameworks offer forms of state management, agent continuations are considered novel because they combine both a human approval mechanism and arbitrary nesting of complex agent states [00:25:03] [00:25:36]. This work emerged from Snaplogic’s Agent Creator research group, where continuations were prototyped in both Python and the visual agent building interface [00:19:48] [00:25:48].