Developing AI agents and agentic workflows

From: aidotengineer

The landscape of AI in enterprise is evolving, with a significant shift towards agentic workflows and the deployment of AI agents. OpenAI, as a foundational model developer, outlines its approach to bringing these advanced capabilities to production and shares key insights from its experience in the field [00:00:20].

OpenAI’s Operational Model [00:00:40]

OpenAI operates with two core engineering teams:

Research Team: Comprises 1,200 researchers focused on inventing and deploying foundational models [00:00:48].
Apply Team: Takes these foundational models and builds them into products, such as ChatGPT and the API, making GPT models available for deployment [00:01:00].

The go-to-market team then places these products into end-users’ hands, assisting in automating internal operations and integrating AI into the workforce and products [00:01:11]. A continuous iterative loop involves gathering feedback from the field to improve products and core models, completing a “research flywheel” [00:01:27].

The Enterprise AI Customer Journey [00:01:45]

Enterprises typically navigate their AI journey in three phases:

Building an AI-Enabled Workforce: Empowering employees to become AI literate and integrate AI into their daily work [00:01:54]. This often starts with products like ChatGPT [00:02:30].
Automating AI Operations: Implementing internal use cases to build automation or co-pilot functionalities within the workforce [00:02:11]. More complex or customized needs typically leverage the OpenAI API [00:02:44].
Infusing AI into End Products: Integrating AI into end-user-facing products, primarily through API use cases [00:02:22].

Crafting an Enterprise AI Strategy [00:03:05]

A successful AI strategy within an enterprise should:

Determine Top-Down Strategic Guidance: Identify how AI technology aligns with the broader business strategy, rather than solely focusing on an “AI strategy” [00:03:10].
Identify High-Impact Use Cases: Select one or two significant use cases that are high impact and scope them out for initial delivery [00:03:36].
Build Divisional Capability: Enable teams and infuse AI across the organization through enablement, building centers of excellence, or creating centralized technological platforms [00:03:51].

The Use Case Journey [00:04:25]

A typical three-month use case journey involves:

Ideation and Scoping: Initial ideation, scoping, architecture review, and defining clear success metrics and KPIs [00:04:40].
Development: The bulk of the time, involving iterative prompting strategies, RAG (Retrieval Augmented Generation), and continuous improvement [00:04:53]. OpenAI teams often engage closely through workshops, office hours, and paired programming [00:05:06].
Testing and Evaluation: Conducting AB testing and beta rollouts based on predefined evaluation metrics [00:05:24].
Production and Maintenance: Launching the solution, performing scale optimization testing, and ongoing maintenance [00:05:37].

OpenAI supports partners by providing early access to models and features, internal experts from research and product teams, and joint roadmap sessions [00:06:05].

Case Study: Morgan Stanley’s Internal Knowledge Assistant [00:06:51]

Morgan Stanley developed an internal knowledge assistant to help wealth managers quickly and accurately answer client questions using their vast corpus of research reports and live data [00:06:58]. Initially, accuracy was low at 45% [00:07:20]. Through collaboration with OpenAI, new methods were introduced, including:

Hybrid retrieval [00:07:26]
Fine-tuning embeddings and different chunking strategies [00:07:28]
Reranking and classification steps [00:07:36]
Prompt engineering and query expansion [00:07:44]

These iterative improvements boosted accuracy from 45% to 85%, and ultimately to 98%, exceeding their 90% goal [00:07:38].

Developing AI Agents and Agentic Workflows [00:08:00]

2025 is anticipated to be the year of agents, where generative AI transitions from being merely an assistant to a co-worker [00:08:02], [00:08:37]. OpenAI has gathered best practices and “battle scars” from building agentic workflows in the field [00:08:14].

What is an AI Agent? [00:09:00]

An AI agent is defined as an AI application composed of:

A model with instructions (typically a prompt) [00:09:06].
Access to tools for information retrieval and external system interaction [00:09:10].
An execution loop, where the model controls its termination [00:09:16].

In each cycle, an agent receives natural language instructions, decides whether to call tools, runs them, synthesizes a response with the tool’s return values, and provides an answer to the user. It can also determine when its objective is met and terminate the loop [00:09:24].

Best Practices for Building AI Agents [00:09:50]

1. Use Abstractions Minimally: Start Simple, Optimize When Needed [00:10:50]

It’s tempting to start with frameworks when building AI agents, as they offer quick proofs of concept [00:10:23]. However, this often defers design decisions before understanding constraints, making optimization difficult [00:10:33].

Start with Primitives

Build first with primitives (raw API calls) to understand how the task decomposes, where failures occur, and what needs improvement [00:10:53]. Introduce abstractions only when you find yourself reinventing the wheel (e.g., embedding strategies or model graders) [00:11:04]. The focus should be on understanding data, failure points, and constraints, not just choosing the “right” framework [00:11:27].

2. Start with a Single Agent [00:12:08]

Jumping straight into multi-agent systems with dynamic coordination often introduces too many unknowns [00:11:48].

Incremental Complexity

Begin with a single agent purpose-built for a specific task [00:12:10]. Deploy it with a limited user set and observe its performance to identify real bottlenecks (e.g., hallucinations, latency, inaccuracy) [00:12:16]. Complexity should increase only as more intense failure cases and constraints are discovered [00:12:44]. The goal is to build a system that works, not necessarily a complicated one [00:12:51].

3. Graduate to a Network of Agents with Handoffs [00:13:07]

For more complex tasks, a network of agents allows multiple agents to collaborate on resolving complex requests or performing interrelated tasks [00:13:17]. This can be viewed as specialized agents handling sub-flows within a larger agentic workflow [00:13:31].

Handoffs: This is the process where one agent transfers control of an active conversation to another agent, preserving the entire conversation history and context [00:13:38]. This allows for swapping out the model, prompt, and tool definitions, providing flexibility for a wide range of scenarios [00:14:42].

Customer Service Flow [00:14:01]

A fully automated customer service flow can use a network of agents with handoffs:

Triage: A smaller model (e.g., GPT-4o mini) performs initial triage on incoming requests [00:14:16].

Conversation Management: A more capable model (e.g., GPT-4o) manages the main conversation with the user [00:14:23].

Accuracy-Sensitive Tasks: A different model (e.g., O3 mini reasoning model) handles tasks requiring high accuracy, such as checking refund eligibility [00:14:30].

This approach ensures the right tools are brought to the right job [00:14:12].

4. Use Guardrails to Handle Edge Cases [00:14:52]

Guardrails are mechanisms that enforce safety, security, and reliability within an application, preventing misuse and maintaining system integrity [00:14:58].

Guardrail Implementation

Simple and Focused Prompts: Keep the model instructions simple and focused on the target task for maximum interoperability and predictable accuracy improvements [00:15:11].

Parallel Guardrails: Guardrails should generally not be part of the main prompts but should instead run in parallel [00:15:25]. The availability of faster, cheaper models like GPT-4o mini makes this more accessible [00:15:33].

Deferred High-Stakes Actions: High-stakes tool calls or user responses (e.g., issuing refunds, sharing personal information) should be deferred until all guardrails have returned and verified safety [00:15:42].

For example, input guardrails can prevent prompt injection, and output guardrails can evaluate the agent’s response [00:15:58].

Conclusion [00:16:06]

To effectively develop and scale AI agents and agentic workflows:

Use abstractions minimally [00:16:12].
Start with a single agent [00:16:13].
Graduate to a network of agents when facing more intense challenges [00:16:14].
Keep prompts simple and focused on the “happy path,” using guardrails to manage edge cases [00:16:19].

Tubegraph

Explorer

Table of Contents

Developing AI agents and agentic workflows

OpenAI’s Operational Model [00:00:40]

The Enterprise AI Customer Journey [00:01:45]

Crafting an Enterprise AI Strategy [00:03:05]

The Use Case Journey [00:04:25]

Case Study: Morgan Stanley’s Internal Knowledge Assistant [00:06:51]

Developing AI Agents and Agentic Workflows [00:08:00]

What is an AI Agent? [00:09:00]

Best Practices for Building AI Agents [00:09:50]

1. Use Abstractions Minimally: Start Simple, Optimize When Needed [00:10:50]

2. Start with a Single Agent [00:12:08]

3. Graduate to a Network of Agents with Handoffs [00:13:07]

4. Use Guardrails to Handle Edge Cases [00:14:52]

Conclusion [00:16:06]

Graph View

Backlinks