From: aidotengineer
OpenAI focuses on helping enterprises build and scale use cases and bring them to production [00:00:24]. They also offer a sneak peek into agents and experiences building agentic workflows [00:00:29].
OpenAI’s Operational Structure
OpenAI operates with two core engineering teams [00:00:45]:
- Research Team: Comprises 1,200 researchers who invent and deploy foundational models [00:00:48].
- Apply Team: Takes these foundational models and builds them into products like ChatGPT and the API, making GPT models available for deployment [00:01:00].
Their go-to-market team helps get these products into the hands of a workforce and products to automate internal operations [00:01:11]. This process involves an iterative loop, taking feedback from the field to improve products and core models through a research flywheel [00:01:27].
The Enterprise AI Customer Journey
OpenAI typically observes the AI customer journey in three phases, though not necessarily in sequence [00:01:47]:
- Building an AI-Enabled Workforce: This initial step involves getting AI into the hands of employees to become AI-literate and use AI daily in their work [00:01:54]. OpenAI’s ChatGPT is often the starting point for this [00:02:30].
- Automating AI Operations: Enterprises graduate to this phase by building internal use cases for automation or co-pilot type applications [00:02:11]. While ChatGPT can partially assist, the API is utilized for more complex use cases requiring customization [00:02:38].
- Infusing AI into End Products: The final step involves integrating AI into end-user-facing products, primarily through API use cases [00:02:22]. This aligns with AI in enterprise applications and integrating AI into business operations concepts.
Crafting an Enterprise AI Strategy
OpenAI’s approach to strategy development in enterprises involves a few key steps [00:03:07]:
- Strategic Guidance (Top-Down): The focus is not merely on an AI strategy but on the broader business strategy, with OpenAI helping to integrate technology to meet those business goals [00:03:17].
- Use Case Identification: Identify one or two impactful, high-value use cases to begin with, scoping them out for successful delivery [00:03:36].
- Building Divisional Capability: Enable teams and infuse AI across the organization through enablement, building Centers of Excellence, or establishing centralized technological platforms [00:03:52].
The Use Case Journey: An Illustrative Example
OpenAI outlines a typical use case journey, often spanning about three months [00:04:28]:
- Ideation & Scoping: Involves initial ideation, detailed scoping, architectural review to determine how AI fits into the existing stack, and clear definition of success metrics and KPIs [00:04:40].
- Development: The most time-intensive phase, focusing on iterative improvement through prompting strategies, RAG (Retrieval-Augmented Generation), and other techniques [00:04:53]. OpenAI’s team closely collaborates through workshops, office hours, and paired programming sessions [00:05:06].
- Testing & Evaluation: Utilizes defined evaluations to conduct A/B testing and beta rollouts to understand practical performance [00:05:22].
- Production: The final stage involves launch rollout, scale optimization testing to ensure functionality for many end-users, and ongoing maintenance [00:05:37].
Partnership Benefits with OpenAI
OpenAI deploys a dedicated team and expects partners to do the same [00:05:55]. Key benefits of this partnership include:
- Early Access to Models and Features: Provides insights into upcoming developments, typically within the next two quarters [00:06:05].
- Access to Internal Experts: Collaboration with research, engineering, and product teams to accelerate progress [00:06:35].
- Joint Roadmap Sessions: Ensures alignment with the partner’s future roadmap [00:06:42].
Case Study: Morgan Stanley
OpenAI partnered with Morgan Stanley to build an internal knowledge assistant for their wealth managers [00:06:54]. This assistant allowed wealth managers to query a large corpus of knowledge, including research reports and stock ticker data, to provide highly accurate information to clients [00:07:00].
Initially, accuracy was around 45% [00:07:19]. Through collaboration, OpenAI introduced methods like:
- Hydra retrieval [00:07:26]
- Fine-tuning and embeddings [00:07:28]
- Different chunking strategies [00:07:29]
These improvements led to 85% accuracy [00:07:37]. Further enhancements with prompt engineering and query expansion pushed accuracy to 98%, significantly exceeding their 90% goal [00:07:40]. This demonstrates effective using existing enterprise systems for AI integration.
The Rise of AI Agents
OpenAI anticipates 2025 to be the “year of Agents”, where GenAI truly graduates from an assistant to a co-worker [00:08:02]. They have identified patterns and anti-patterns in agent development from working with customers building state-of-the-art agents and their own agentic products [00:08:26].
Defining an AI Agent
OpenAI defines an agent as an AI application comprising [00:09:02]:
- A model with instructions (usually a prompt) [00:09:04].
- Access to tools for information retrieval and external system interaction [00:09:11].
- An encapsulated execution loop whose termination is controlled by the model itself [00:09:16].
In each execution cycle, the agent receives natural language instructions, decides whether to issue tool calls, runs those tools, synthesizes a response, and provides an answer to the user. The agent may also determine when its objective is met and terminate the loop [00:09:24].
Lessons Learned Building Agents
OpenAI shares four key lessons for building and scaling AI use cases with agents [00:09:50]:
1. Use Abstractions Minimally
- Start with Primitives: Begin by making raw API calls, logging results, outputs, and failures to understand how the task decomposes and where failures occur [00:10:07]. This helps in understanding constraints and optimizing solutions [00:10:48].
- Introduce Abstraction When Necessary: Abstractions should only be introduced when there’s a clear need to avoid reinventing the wheel (e.g., re-implementing an embedding strategy or model graders) [00:11:05].
- Focus on Data and Failure Points: Developing scalable agents is less about choosing the right framework and more about understanding data, failure points, and constraints [00:11:23].
Start Simple, Optimize, and Abstract When it Improves the System
First build with primitives to understand the task decomposition and failure points, then introduce abstraction only when you find yourself reinventing the wheel or when it clearly makes the system better [00:10:53].
2. Start Simple (Single Agent First)
- Avoid Jumping to Multi-Agent Systems: Designing complex multi-agent systems too early introduces too many unknowns and provides limited immediate insight [00:11:48].
- Start with a Single, Purpose-Built Agent: Begin with a single agent designed for a specific task, deploy it with limited users, and observe its performance [00:12:08].
- Identify Bottlenecks Incrementally: This approach allows for the identification of real bottlenecks (hallucinations, low adoption due to latency, inaccuracy) and incremental improvements based on user needs [00:12:21].
- Complexity Increases with Discovered Failure Cases: Complexity should be increased as more intense failure cases and constraints are discovered, with the goal being a functional system, not necessarily a complicated one [00:12:44].
3. Network of Agents and Handoffs
- Handle Complex Tasks with Networks: For more complex tasks, a network of agents can be used. This is a collaborative system where multiple specialized agents work together to resolve complex requests or perform interrelated tasks [00:13:07].
- Utilize Handoffs: Handoffs are the process by which one agent transfers control of an active conversation to another agent [00:13:38]. This is similar to a phone call transfer but preserves the entire conversation history, allowing the new agent to seamlessly continue [00:13:53].
- Example: Automated Customer Service: A fully automated customer service flow can be implemented with a network of agents and handoffs. Different models can be assigned to different tasks (e.g., GPT-4o mini for triage, GPT-4o for managing disputes, O3 mini for accuracy-sensitive tasks like refund eligibility) [00:14:03]. Handoffs maintain context while swapping out the model, prompt, and tool definitions [00:14:39]. This concept relates to OpenAI’s Agents SDK.
4. Guardrails
- Definition: Guardrails are mechanisms that enforce safety, security, and reliability within an application, preventing misuse and maintaining system integrity [00:14:57].
- Simple, Focused Instructions: Keeping model instructions simple and focused on the target task ensures maximum interoperability and predictable accuracy/performance improvements [00:15:12].
- Parallel Guardrails: Guardrails should generally not be part of main prompts but run in parallel [00:15:25]. The availability of faster and cheaper models like GPT-4o mini makes this more accessible [00:15:33].
- Deferred High-Stakes Actions: High-stakes tool calls or user responses (e.g., issuing refunds, sharing personal account info) should be deferred until all guardrails have returned their results [00:15:42].
Key Takeaways for Building Agents
- Use abstractions minimally [00:16:12].
- Start with a single agent [00:16:14].
- Graduate to a network of Agents for more intense scenarios [00:16:17].
- Keep prompts simple and focused on the happy path, using guardrails to handle edge cases [00:16:19].
This comprehensive approach supports implementing AI in enterprises and the broader integration of AI in business operations.