Building and scaling AI use cases for enterprises

From: aidotengineer

This article discusses the journey of implementing AI in enterprises, focusing on how to build and scale use cases with OpenAI’s technologies and insights into agentic workflows [00:00:20].

OpenAI’s Operating Model [00:00:40]

OpenAI operates with two core engineering teams:

Research Team: Comprising 1,200 researchers, this team is responsible for inventing and deploying foundational AI models [00:00:48].
Apply Team: This team takes the foundational models and builds them into products like ChatGPT and the API, making GPT models available for deployment [00:01:00].

Beyond engineering, a “go-to-market” team helps deliver these products to end-users, automating internal operations and integrating AI into the workforce [00:01:11]. This process includes an iterative loop where feedback from the field directly improves products and core models through a “research flywheel” [00:01:27]. This represents OpenAI’s approach to AI deployment and enterprise integration [00:01:41].

The AI Customer Journey in Enterprise [00:01:45]

The AI customer journey for enterprises typically unfolds in three phases:

Building an AI-Enabled Workforce: This initial step involves getting AI tools into the hands of employees to foster AI literacy and encourage daily use in their work [00:01:54]. OpenAI’s ChatGPT is a primary product for this phase [00:02:30].
Automating AI Operations: Enterprises then graduate to automating internal AI operations, such as building automation or co-pilot type use cases [00:02:11]. While ChatGPT can partially assist, the OpenAI API is typically used for more complex or customized needs [00:02:38].
Infusing AI into End Products: The final phase involves infusing AI into end-user facing products, primarily through API use cases [00:02:22].

Crafting an Enterprise AI Strategy [00:03:05]

When enterprises craft their AI strategy, the process often includes:

Top-Down Strategic Guidance: It’s crucial to first define the broader business strategy and then determine where AI technology aligns with that strategy [00:03:10].
Identifying High-Impact Use Cases: After setting the strategy, identify one or two high-impact use cases to scope out and deliver upon [00:03:33].
Building Divisional Capability: This involves enabling teams and infusing AI throughout the organization through enablement programs, establishing centers of excellence, or building a centralized technological platform for others to build upon [00:03:52].

The Use Case Journey [00:04:25]

A typical use case journey, illustrated as a three-month example, involves several key phases:

Ideation & Scoping: This includes initial ideation, detailed scoping, architectural review to understand how AI fits into the existing stack, and clearly defining success metrics and KPIs [00:04:40].
Development: This is the bulk of the time, involving iterative development, refining prompting strategies, and implementing techniques like Retrieval Augmented Generation (RAG) to continuously improve the use case [00:04:53]. OpenAI’s team engages closely during this phase through workshops, office hours, paired programming, and webinars to accelerate progress [00:05:06].
Testing & Evaluation: After development, the focus shifts to testing and evaluation, including A/B testing and beta rollouts, against the predefined evaluation metrics [00:05:24].
Production: This phase includes the launch rollout and scale optimization testing to ensure the solution performs effectively for many end-users [00:05:37].
Maintenance: Ongoing maintenance ensures continued performance and improvement [00:05:45].

OpenAI’s Partnership Approach [00:05:50]

OpenAI partners with enterprises by providing:

A dedicated team, expecting a dedicated team from the enterprise as well [00:05:55].
Early access to new models and features, offering a glimpse into upcoming developments [00:06:05].
Access to internal experts from research, engineering, and product teams to accelerate development [00:06:35].
Joint roadmap sessions to align on future plans [00:06:42].

Case Study: Morgan Stanley [00:06:51]

Morgan Stanley partnered with OpenAI to build an internal knowledge assistant for its wealth managers [00:06:54]. The goal was to provide highly accurate information from their vast knowledge corpus (research reports, live stock data) to respond to clients [00:07:00]. Initial accuracy was around 45% [00:07:20].

Through collaboration, OpenAI introduced new methods during development, such as:

Hybrid Retrieval: [00:07:26]
Fine-tuned Embeddings: [00:07:28]
Different Chunking Strategies: [00:07:29]
Reranking and Classification Steps: [00:07:35]
Prompt Engineering and Query Expansion: [00:07:43]

These interventions significantly improved accuracy, reaching 85% and ultimately 98%—surpassing their 90% goal [00:07:38].

Building Agents and Agentic Workflows [00:08:00]

The year 2025 is anticipated to be the “year of Agents,” where GenAI truly graduates from an assistant to a co-worker [00:08:02].

An agent is defined as an AI application consisting of:

A model with instructions, typically in the form of a prompt [00:09:04].
Access to tools for retrieving information and interacting with external systems [00:09:11].
An execution loop whose termination is controlled by the model itself [00:09:16].

In each cycle, an agent receives natural language instructions, decides whether to issue tool calls, runs those tools, synthesizes a response with the tool return values, and provides an answer to the user [00:09:25]. The agent can also determine if its objective has been met and terminate the loop [00:09:42].

Lessons Learned in Building AI Systems (Agents) [00:09:50]

OpenAI has identified four key lessons or best practices for building scalable AI systems, specifically agents:

Use Abstractions Minimally: Start Simple, Optimize When Needed [00:10:07]
- Starting with frameworks can be enticing but often obscures how the system behaves and what primitives it uses, deferring design decisions before constraints are understood [00:10:33].
- A better approach is to first build with primitives to understand task decomposition, failure points, and necessary improvements [00:10:52]. Abstraction should only be introduced when reinventing the wheel (e.g., re-implementing embedding strategies or model graders) [00:11:05]. The focus should be on understanding data, failure points, and constraints, not just choosing a framework [00:11:23].
Start with a Single Agent [00:11:44]
- Teams often jump into designing complex multi-agent systems too soon, creating unknowns and yielding little insight [00:11:48].
- It’s recommended to start with a single agent purpose-built for a single task, deploy it to production with limited users, and observe its performance [00:12:08]. This approach helps identify real bottlenecks like hallucinations, low adoption due to latency, or inaccuracy from poor retrieval [00:12:21]. Complexity should only increase as more intense failure cases and constraints are discovered [00:12:44].
Graduate to a Network of Agents with Handoffs for Complexity [00:13:00]
- For more complex tasks, a network of agents is a collaborative system where multiple agents work in concert to resolve complex requests or perform interrelated tasks [00:13:07]. These are specialized agents handling subflows within a larger agentic workflow [00:13:28].
- Handoffs are the process by which one agent transfers control of an active conversation to another agent, preserving the entire conversation history for the new agent [00:13:38].
- Example: A fully automated customer service flow can use a network of agents and handoffs. A GPT-4o mini model can perform initial triage, then a GPT-4o dispute agent can manage the conversation, and finally, an O3 mini reasoning model can handle accuracy-sensitive tasks like checking refund eligibility [00:14:02]. Handoffs allow swapping out the model, prompt, and tool definitions while maintaining context, offering flexibility for a wide range of scenarios [00:14:39].
Keep Prompts Simple and Focused, Use Guardrails for Edge Cases [00:14:52]
- Guardrails are mechanisms that enforce safety, security, and reliability within an application, preventing misuse and ensuring system integrity [00:14:57].
- Model instructions should be kept simple and focused on the target task to ensure maximum interoperability and predictable accuracy [00:15:11].
- Guardrails should not necessarily be part of the main prompts but should be run in parallel [00:15:25]. The availability of faster and cheaper models like GPT-4o mini makes this more accessible [00:15:33].
- High-stakes tool calls or user responses (e.g., issuing refunds, showing personal account information) can be deferred until all guardrails have returned a safe result [00:15:42]. An example includes running an input guardrail to prevent prompt injection and output guardrails on the agent’s response [00:15:56].

In summary, for building and scaling AI systems, particularly agents, the key lessons are to use abstractions minimally, start with a single agent, graduate to a network of agents when necessary, and keep prompts simple while using guardrails for edge cases [00:16:08].

Tubegraph

Explorer

Table of Contents