From: aidotengineer
The landscape of AI in enterprise is evolving, with a significant shift towards agentic workflows and the deployment of AI agents. OpenAI, as a foundational model developer, outlines its approach to bringing these advanced capabilities to production and shares key insights from its experience in the field [00:00:20].
OpenAI’s Operational Model [00:00:40]
OpenAI operates with two core engineering teams:
- Research Team: Comprises 1,200 researchers focused on inventing and deploying foundational models [00:00:48].
- Apply Team: Takes these foundational models and builds them into products, such as ChatGPT and the API, making GPT models available for deployment [00:01:00].
The go-to-market team then places these products into end-users’ hands, assisting in automating internal operations and integrating AI into the workforce and products [00:01:11]. A continuous iterative loop involves gathering feedback from the field to improve products and core models, completing a “research flywheel” [00:01:27].
The Enterprise AI Customer Journey [00:01:45]
Enterprises typically navigate their AI journey in three phases:
- Building an AI-Enabled Workforce: Empowering employees to become AI literate and integrate AI into their daily work [00:01:54]. This often starts with products like ChatGPT [00:02:30].
- Automating AI Operations: Implementing internal use cases to build automation or co-pilot functionalities within the workforce [00:02:11]. More complex or customized needs typically leverage the OpenAI API [00:02:44].
- Infusing AI into End Products: Integrating AI into end-user-facing products, primarily through API use cases [00:02:22].
Crafting an Enterprise AI Strategy [00:03:05]
A successful AI strategy within an enterprise should:
- Determine Top-Down Strategic Guidance: Identify how AI technology aligns with the broader business strategy, rather than solely focusing on an “AI strategy” [00:03:10].
- Identify High-Impact Use Cases: Select one or two significant use cases that are high impact and scope them out for initial delivery [00:03:36].
- Build Divisional Capability: Enable teams and infuse AI across the organization through enablement, building centers of excellence, or creating centralized technological platforms [00:03:51].
The Use Case Journey [00:04:25]
A typical three-month use case journey involves:
- Ideation and Scoping: Initial ideation, scoping, architecture review, and defining clear success metrics and KPIs [00:04:40].
- Development: The bulk of the time, involving iterative prompting strategies, RAG (Retrieval Augmented Generation), and continuous improvement [00:04:53]. OpenAI teams often engage closely through workshops, office hours, and paired programming [00:05:06].
- Testing and Evaluation: Conducting AB testing and beta rollouts based on predefined evaluation metrics [00:05:24].
- Production and Maintenance: Launching the solution, performing scale optimization testing, and ongoing maintenance [00:05:37].
OpenAI supports partners by providing early access to models and features, internal experts from research and product teams, and joint roadmap sessions [00:06:05].
Case Study: Morgan Stanley’s Internal Knowledge Assistant [00:06:51]
Morgan Stanley developed an internal knowledge assistant to help wealth managers quickly and accurately answer client questions using their vast corpus of research reports and live data [00:06:58]. Initially, accuracy was low at 45% [00:07:20]. Through collaboration with OpenAI, new methods were introduced, including:
- Hybrid retrieval [00:07:26]
- Fine-tuning embeddings and different chunking strategies [00:07:28]
- Reranking and classification steps [00:07:36]
- Prompt engineering and query expansion [00:07:44]
These iterative improvements boosted accuracy from 45% to 85%, and ultimately to 98%, exceeding their 90% goal [00:07:38].
Developing AI Agents and Agentic Workflows [00:08:00]
2025 is anticipated to be the year of agents, where generative AI transitions from being merely an assistant to a co-worker [00:08:02], [00:08:37]. OpenAI has gathered best practices and “battle scars” from building agentic workflows in the field [00:08:14].
What is an AI Agent? [00:09:00]
An AI agent is defined as an AI application composed of:
- A model with instructions (typically a prompt) [00:09:06].
- Access to tools for information retrieval and external system interaction [00:09:10].
- An execution loop, where the model controls its termination [00:09:16].
In each cycle, an agent receives natural language instructions, decides whether to call tools, runs them, synthesizes a response with the tool’s return values, and provides an answer to the user. It can also determine when its objective is met and terminate the loop [00:09:24].
Best Practices for Building AI Agents [00:09:50]
1. Use Abstractions Minimally: Start Simple, Optimize When Needed [00:10:50]
It’s tempting to start with frameworks when building AI agents, as they offer quick proofs of concept [00:10:23]. However, this often defers design decisions before understanding constraints, making optimization difficult [00:10:33].
Start with Primitives
Build first with primitives (raw API calls) to understand how the task decomposes, where failures occur, and what needs improvement [00:10:53]. Introduce abstractions only when you find yourself reinventing the wheel (e.g., embedding strategies or model graders) [00:11:04]. The focus should be on understanding data, failure points, and constraints, not just choosing the “right” framework [00:11:27].
2. Start with a Single Agent [00:12:08]
Jumping straight into multi-agent systems with dynamic coordination often introduces too many unknowns [00:11:48].
Incremental Complexity
Begin with a single agent purpose-built for a specific task [00:12:10]. Deploy it with a limited user set and observe its performance to identify real bottlenecks (e.g., hallucinations, latency, inaccuracy) [00:12:16]. Complexity should increase only as more intense failure cases and constraints are discovered [00:12:44]. The goal is to build a system that works, not necessarily a complicated one [00:12:51].
3. Graduate to a Network of Agents with Handoffs [00:13:07]
For more complex tasks, a network of agents allows multiple agents to collaborate on resolving complex requests or performing interrelated tasks [00:13:17]. This can be viewed as specialized agents handling sub-flows within a larger agentic workflow [00:13:31].
Handoffs: This is the process where one agent transfers control of an active conversation to another agent, preserving the entire conversation history and context [00:13:38]. This allows for swapping out the model, prompt, and tool definitions, providing flexibility for a wide range of scenarios [00:14:42].
Customer Service Flow [00:14:01]
A fully automated customer service flow can use a network of agents with handoffs:
- Triage: A smaller model (e.g., GPT-4o mini) performs initial triage on incoming requests [00:14:16].
- Conversation Management: A more capable model (e.g., GPT-4o) manages the main conversation with the user [00:14:23].
- Accuracy-Sensitive Tasks: A different model (e.g., O3 mini reasoning model) handles tasks requiring high accuracy, such as checking refund eligibility [00:14:30].
This approach ensures the right tools are brought to the right job [00:14:12].
4. Use Guardrails to Handle Edge Cases [00:14:52]
Guardrails are mechanisms that enforce safety, security, and reliability within an application, preventing misuse and maintaining system integrity [00:14:58].
Guardrail Implementation
- Simple and Focused Prompts: Keep the model instructions simple and focused on the target task for maximum interoperability and predictable accuracy improvements [00:15:11].
- Parallel Guardrails: Guardrails should generally not be part of the main prompts but should instead run in parallel [00:15:25]. The availability of faster, cheaper models like GPT-4o mini makes this more accessible [00:15:33].
- Deferred High-Stakes Actions: High-stakes tool calls or user responses (e.g., issuing refunds, sharing personal information) should be deferred until all guardrails have returned and verified safety [00:15:42].
For example, input guardrails can prevent prompt injection, and output guardrails can evaluate the agent’s response [00:15:58].
Conclusion [00:16:06]
To effectively develop and scale AI agents and agentic workflows:
- Use abstractions minimally [00:16:12].
- Start with a single agent [00:16:13].
- Graduate to a network of agents when facing more intense challenges [00:16:14].
- Keep prompts simple and focused on the “happy path,” using guardrails to manage edge cases [00:16:19].