Challenges in scaling AI products

From: aidotengineer

Developing and deploying generative AI products, especially those involving AI agents, presents a unique set of challenges when aiming for scale. Bloomberg, having invested in AI for 15-16 years, made a strategic pivot in 2023 to build on top of available large language models (LLMs) after the rise of ChatGPT and the open-source community, rather than solely developing their own from scratch [00:01:06]. This shift allowed them to focus on solving the complexities of putting AI products into production [00:02:21].

Defining Tools vs. Agents

For clarity, the speaker defines “tools” as primarily cognitive architectures for language agents, while “agents” are more autonomous, possess memory, and can evolve [00:03:17]. This distinction is crucial for understanding the architecture of AI products being scaled.

Context: Bloomberg’s Operations

Bloomberg operates as a fintech company serving diverse financial clients, including research analysts, portfolio managers, and traders [00:03:51]. The company processes an immense volume of data daily: 400 billion ticks of structured data, over a billion unstructured messages, and millions of well-written documents like news, with over 40 years of historical data [00:04:34].

Core principles for Bloomberg’s products, regardless of AI integration, are non-negotiable:

Precision [00:06:25]
Comprehensiveness [00:06:25]
Speed [00:06:27]
Throughput [00:06:27]
Availability [00:06:27]
Protecting contributor and client data [00:06:30]
Transparency [00:06:34]

These principles ground the challenges faced when building and scaling AI agents using current technology [00:06:41].

Case Study: Earnings Call Summarization

In 2023, Bloomberg focused on assisting research analysts by automatically answering common questions from public company quarterly earnings call transcripts [00:06:54]. However, initial performance of out-of-the-box models was not ideal in terms of precision, accuracy, and factuality [00:08:11]. This necessitated significant MLOps work to implement:

Remediation workflows [00:08:24]
Circuit breakers [00:08:26]
Continuous monitoring and remediation to ensure summary accuracy, especially since these summaries are published [00:08:38].

Today’s agentic architecture is “semi-agentic” due to a lack of full trust in autonomous systems [00:08:57]. Essential guardrails, such as prohibiting financial advice or ensuring factuality, are hard-coded and non-optional components of any agent [00:09:09].

Two Aspects of Scaling

Scaling AI products involves two primary considerations:

1. Embracing Fragility and Building Resilience

Unlike traditional software APIs or even earlier machine learning models with predictable input/output distributions, LLMs and compositions of LLMs (agents) introduce a significant degree of stochasticity and error multiplication [00:11:18]. This leads to fragile behavior, making it difficult to predict outcomes [00:11:29].

Previously, with ML products like a news sentiment detector (built in 2009), the input and output spaces were well-defined, allowing for robust testing and monitoring. Even then, out-of-band communication with downstream consumers was necessary when model versions changed [00:12:41].

For agentic architectures, the goal is to make daily improvements to agents, moving away from slow, batch-regression-test-based release cycles [00:13:03]. When agents are composed, an error in one component (e.g., misinterpreting a query from “quarterly” to “monthly” data) can lead to compounding errors downstream [00:14:16].

To address this, the strategy is not to rely on upstream systems being perfectly accurate, but to factor in their fragility and continuous evolution [00:14:23]. Implementing independent safety checks within each agent, even across internal teams, allows for faster evolution of individual agents without complex handshake signals and sign-offs from all downstream callers [00:14:58]. This resilience enables faster iteration and deployment.

2. Evolving Organizational Structure

The organizational structure should adapt to the demands of building AI agents, moving beyond traditional machine learning team structures [00:15:38]. Key questions arise:

How many agents to build? [00:16:06]
What should each agent do? [00:16:08]
Should agents have overlapping functionality? [00:16:11]

Initially, when product design is unclear and fast iteration is needed, a vertically aligned team structure (where one team handles a full product/agent) is beneficial [00:16:46]. This allows for rapid prototyping, shared code, data, and models [00:17:01].

As understanding of individual products or agents matures—their use cases, strengths, and weaknesses—the organization can transition to a more horizontally aligned structure [00:17:18]. This is when optimizations like performance increases, cost reductions, improved testability, and transparency become priorities [00:17:29]. For example, guardrails (e.g., preventing financial advice) are implemented horizontally across all products, avoiding redundant effort across 50 different teams [00:17:41]. It’s crucial for an organization to determine the right time to create these horizontal functions and break down monolithic agents into smaller, more manageable pieces [00:18:02].

For a research analyst agent, this factorization means separate agents for understanding user queries and session context, figuring out necessary information, and generating answers with strict rigor [00:18:24]. Non-optional guardrails are called at multiple points, reflecting the “semi-agentic” approach [00:18:50].

Tubegraph

Explorer

Table of Contents