From: aidotengineer
This article explores how enterprises can build and scale AI use cases, focusing on OpenAI’s approach, common customer journeys, strategic implementation, and lessons learned from deploying AI agents in the field.
OpenAI’s Operational Model
OpenAI operates with two core engineering teams:
- Research Team A group of 1,200 researchers focused on inventing and deploying foundational models [00:00:48].
- Apply Team This team takes the foundational models and builds them into products, such as ChatGPT and the OpenAI API [00:01:00].
The go-to-market team then deploys these products to end-users, helping organizations automate internal operations and integrate AI into their workforce and products [00:01:11]. An iterative loop ensures that feedback from the field directly improves both products and core models [00:01:27].
The Enterprise AI Customer Journey
OpenAI observes the enterprise AI customer journey typically happening in three phases [00:01:47]:
- Building an AI-Enabled Workforce [00:01:54]: This initial step involves getting AI into the hands of employees to foster AI literacy and daily use [00:01:56]. ChatGPT is typically the starting point for this phase [00:02:30].
- Automating Internal Operations [00:02:08]: This phase focuses on internal use cases, building automation, or implementing co-pilot type functionalities [00:02:14]. While ChatGPT can contribute, the API is often utilized for more complex or customized needs [00:02:38].
- Infusing AI into End Products [00:02:22]: The final step involves integrating AI into end-user facing products, primarily leveraging API use cases [00:02:48].
Crafting an Enterprise AI Strategy
Strategies for effective AI implementation for AI adoption within an enterprise typically follow a structured approach:
- Define Broad Business Strategy First [00:03:17]: The primary focus should be on the overarching business strategy, with AI technology serving as a tool to achieve those strategic goals [00:03:20].
- Identify High-Impact Use Cases [00:03:36]: Select one or two significant, high-impact use cases to begin with, ensuring clear scope and deliverables [00:03:39].
- Build Divisional Capability [00:03:52]: Enable teams and infuse AI throughout the organization through various means, including enablement programs, establishing Centers of Excellence, or building a centralized technological platform [00:03:59].
Use Case Journey Playbook
A typical use case journey, illustrated often over a three-month period, involves several key phases [00:04:31]:
- Ideation & Scoping [00:04:40]: This involves initial brainstorming, architecture review to fit AI into the existing stack, and clearly defining success metrics and KPIs [00:04:42].
- Development [00:04:53]: The bulk of the time is spent here, iterating on prompting strategies and Retrieval Augmented Generation (RAG) to continuously improve the use case [04:55:00]. OpenAI teams often engage closely during this phase through workshops, office hours, and paired programming [00:05:06].
- Testing & Evaluation [00:05:24]: Utilize pre-defined evaluation metrics for A/B testing and beta rollouts to understand real-world performance [00:05:27].
- Production [00:05:37]: Involves launch rollout and scale optimization testing to ensure functionality when deployed to many users [00:05:38], followed by ongoing maintenance [00:05:45].
OpenAI supports partners with early access to new models and features, insights into future roadmaps, and access to internal experts for acceleration and joint roadmap sessions [00:06:05].
Case Study: Morgan Stanley’s Internal Knowledge Assistant
Case study of AI application in documentation Morgan Stanley collaborated with OpenAI to build an internal knowledge assistant for their wealth managers [00:06:54]. The goal was to provide highly accurate information from a vast corpus of knowledge, including research reports and live stock data, to enable wealth managers to respond to clients effectively [00:07:00].
- Initial Accuracy: The initial accuracy was low, around 45% [00:07:20].
- Intervention & Improvement: OpenAI introduced methods like:
- Hybrid Retrieval [00:07:26]
- Fine-tuning embeddings [00:07:28]
- Different chunking strategies [00:07:29]
- Reranking and classification steps [00:07:36]
- Prompt engineering and query expansion [00:07:44]
- Result: The accuracy improved significantly, reaching 98% against a goal of 90% [00:07:40].
Emerging Trend: AI Agents
There’s a growing focus on AI agents, with 2025 anticipated as “the year of Agents” where Generative AI truly graduates from being an assistant to a co-worker [00:08:02]. An agent is defined as an AI application with:
- A model that has instructions (usually a prompt) [00:09:04].
- Access to tools for information retrieval and external system interaction [00:09:11].
- An execution loop, controlled by the model, allowing it to determine its objectives and terminate when met [00:09:16].
- In each cycle, an agent receives natural language instructions, decides whether to use tools, runs them, synthesizes a response with tool values, and provides an answer to the user [00:09:24].
Lessons Learned from Building AI Agents
Implementing AI agents in daily operations
OpenAI has identified four key insights for building scalable AI agents:
1. Start with Primitives, Abstract Minimally
Many teams are tempted to start with frameworks for rapid proof-of-concept, but this can obscure the system’s behavior and underlying primitives [00:10:14].
- Recommendation: Begin building with primitives (raw API calls, manual logging) to understand task decomposition, failure points, and areas for improvement [00:10:53]. Introduce abstractions only when re-implementing common elements like embedding strategies or model graders [00:11:05].
- Key Idea: Scalable agent development is about understanding data, failure points, and constraints, not primarily about choosing the right abstraction [00:11:23].
2. Start Simple with a Single Agent
Jumping directly into complex multi-agent systems with dynamic reasoning often creates unknowns without yielding much insight [00:11:48].
- Recommendation: Start with a single agent purpose-built for a specific task, deploy it with limited users, and observe its performance [00:12:08]. This approach helps identify real bottlenecks like hallucinations, latency, or retrieval inaccuracies [00:12:21].
- Key Idea: Complexity should be introduced incrementally as more intense failure cases and constraints are discovered; the goal is a working system, not a complicated one [00:12:44].
3. Leverage Networks of Agents and Handoffs for Complexity
For more complex tasks, a network of agents collaborating can provide true value [00:13:03].
- Network of Agents: A collaborative system where multiple agents work together to resolve complex requests or perform interrelated tasks, often through specialized agents handling sub-flows [00:13:17].
- Handoffs: The process where one agent transfers control of an active conversation to another [00:13:38]. This is analogous to a phone call transfer but preserves the entire conversation history, allowing the new agent to seamlessly continue [00:13:51].
- Example (Customer Service): A fully automated customer service flow can use a network of agents. For instance, a GPT-4o mini performs triage, a GPT-4o dispute agent manages the conversation, and an O3 mini reasoning model handles accuracy-sensitive tasks like refund eligibility checks [00:14:02]. Handoffs effectively swap models, prompts, and tool definitions while retaining context [00:14:39].
4. Implement Guardrails in Parallel
Guardrails are mechanisms that enforce safety, security, and reliability within an application, preventing misuse and maintaining integrity [00:14:55].
- Recommendation: Keep model instructions simple and focused on the target task for maximum interoperability and predictable performance [00:15:12]. Guardrails should not be part of the main prompts but run in parallel [00:15:25]. The proliferation of faster, cheaper models like GPT-4o mini makes this more accessible [00:15:35].
- Application: High-stakes tool calls or user responses (e.g., issuing a refund, sharing personal information) can be deferred until all guardrails have returned a clear signal [00:15:42].
Lessons from Building Agents
- Use abstractions minimally [00:16:12].
- Start with a single agent [00:16:14].
- Graduate to a network of agents for more intense scenarios [00:16:15].
- Keep prompts simple and focused on the “happy path,” using guardrails for edge cases [00:16:19].