Agent networks and execution loops

Introduction to Agents [00:08:00]

The year 2025 is anticipated to be the “year of Agents,” marking a shift from AI functioning merely as an assistant to becoming a co-worker [00:08:41]. OpenAI has been actively working with customers and internal teams on agentic products like deep research and operator [00:08:32]. This work has led to identifying prevalent patterns and anti-patterns in agent development [00:08:51].

Defining an AI Agent [00:09:00]

An AI agent is an application that consists of a model with instructions, typically in the form of a prompt [00:09:04]. It has access to tools for information retrieval and interaction with external systems [00:09:11]. Crucially, an agent is encapsulated in an execution loop where the model itself controls the termination [00:09:16].

The Agent Execution Cycle [00:09:24]

Within each execution cycle, an agent operates as an entity that:

Receives instructions in natural language [00:09:28].
Determines whether to issue any tool calls [00:09:29].
Runs those tools [00:09:31].
Synthesizes a response using the tool’s return values [00:09:34].
Provides an answer to the user [00:09:39]. The agent may also decide it has met its objective and terminate the execution loop [00:09:43].

Lessons in Building and Scaling Agents [00:09:50]

1. Start with Primitives, Abstract When Necessary [00:10:53]

When designing an AI agent that orchestrates multiple models, retrieves data, reasons over it, and generates output, there are two initial choices:

Starting with Primitives: Making raw API calls, logging results, and handling failures [00:10:07].
Starting with a Framework: Picking an abstraction that handles many details [00:10:16].

While frameworks are enticing for quick proofs of concept [00:10:23], they can lead to a lack of understanding of the system’s behavior or underlying primitives, deferring critical design decisions [00:10:33]. Without knowing constraints, optimization is difficult [00:10:46].

A more effective approach is to first build with primitives to understand task decomposition, failure points, and areas for improvement [00:10:53]. Abstraction should be introduced only when repeating efforts, such as reimplementing an embedding strategy or model graders [00:11:05]. The focus for scalable agent development should be on understanding data, failure points, and constraints, rather than simply choosing the right framework [00:11:23].

2. Start Simple (Single Agent) [00:11:44]

Often, teams prematurely jump into designing multi-agent systems, with agents calling agents or dynamically coordinating tasks [00:11:48]. This can create unknowns and offer limited insight [00:12:01].

A better strategy is to begin with a single agent purpose-built for a specific task [00:12:08]. Deploying this single agent in production with a limited user set allows for observing its performance and identifying real bottlenecks, such as hallucinations, high latency, or poor retrieval accuracy [00:12:16]. Complexity should only be increased incrementally as more intense failure cases and constraints are discovered [00:12:44]. The goal is to build a system that works, not necessarily a complicated one [00:12:51].

3. Graduate to a Network of Agents with Handoffs [00:13:00]

For handling more complex tasks, the concept of a network of agents and handoffs becomes crucial [00:13:07].

Network of Agents: A collaborative system where multiple agents work in concert to resolve complex requests or perform a series of interrelated tasks [00:13:17]. This allows for specialized agents to handle subflows within a larger agentic workflow [00:13:28].
Handoffs: The process by which one agent transfers control of an active conversation to another agent [00:13:38]. Similar to a phone call transfer, but with the ability to preserve the entire conversation history, allowing the new agent to seamlessly continue [00:13:48].

Example: Customer Service Flow [00:14:01] In a fully automated customer service flow using a network of agents and handoffs, different models can be applied to specific jobs:

GPT-4o Mini can perform triage on incoming requests [00:14:16].
GPT-4o can manage the conversation with the user on a dispute [00:14:23].
An O3 Mini reasoning model can handle accuracy-sensitive tasks, like checking refund eligibility [00:14:30].

Handoffs effectively maintain conversation history and context while allowing for changes in the model, prompt, and tool definitions, providing flexibility for a wide range of scenarios [00:14:39].

4. Implement Guardrails [00:14:52]

Guardrails are mechanisms that enforce safety, security, and reliability within an application, preventing misuse and ensuring system integrity [00:14:58].

Keep Prompts Simple: Model instructions should be kept simple and focused on the target task to ensure maximum interoperability and predictable accuracy improvements [00:15:12].
Parallel Execution: Guardrails should not be part of the main prompts, but run in parallel [00:15:26]. The availability of faster and cheaper models like GPT-4o Mini makes this approach more accessible [00:15:33].
Deferred High-Stakes Actions: High-stakes actions, such as issuing a refund or displaying personal account information, should be deferred until all guardrails have completed their checks [00:15:42].

For example, an input guardrail can prevent prompt injection, while output guardrails can be applied to the agent’s response [00:15:57].

Summary of Agent Development Principles [00:16:06]

To build and scale AI agents effectively:

Use abstractions minimally [00:16:12].
Start with a single agent [00:16:14].
Graduate to a network of Agents when addressing more intense failure cases [00:16:17].
Keep prompts simple and focused on the “happy path,” using guardrails to manage edge cases [00:16:19].

Tubegraph

Explorer

Table of Contents