From: aidotengineer
Developing and deploying generative AI products, especially those involving AI agents, presents a unique set of challenges when aiming for scale. Bloomberg, having invested in AI for 15-16 years, made a strategic pivot in 2023 to build on top of available large language models (LLMs) after the rise of ChatGPT and the open-source community, rather than solely developing their own from scratch [00:01:06]. This shift allowed them to focus on solving the complexities of putting AI products into production [00:02:21].
Defining Tools vs. Agents
For clarity, the speaker defines “tools” as primarily cognitive architectures for language agents, while “agents” are more autonomous, possess memory, and can evolve [00:03:17]. This distinction is crucial for understanding the architecture of AI products being scaled.
Context: Bloomberg’s Operations
Bloomberg operates as a fintech company serving diverse financial clients, including research analysts, portfolio managers, and traders [00:03:51]. The company processes an immense volume of data daily: 400 billion ticks of structured data, over a billion unstructured messages, and millions of well-written documents like news, with over 40 years of historical data [00:04:34].
Core principles for Bloomberg’s products, regardless of AI integration, are non-negotiable:
- Precision [00:06:25]
- Comprehensiveness [00:06:25]
- Speed [00:06:27]
- Throughput [00:06:27]
- Availability [00:06:27]
- Protecting contributor and client data [00:06:30]
- Transparency [00:06:34]
These principles ground the challenges faced when building and scaling AI agents using current technology [00:06:41].
Case Study: Earnings Call Summarization
In 2023, Bloomberg focused on assisting research analysts by automatically answering common questions from public company quarterly earnings call transcripts [00:06:54]. However, initial performance of out-of-the-box models was not ideal in terms of precision, accuracy, and factuality [00:08:11]. This necessitated significant MLOps work to implement:
- Remediation workflows [00:08:24]
- Circuit breakers [00:08:26]
- Continuous monitoring and remediation to ensure summary accuracy, especially since these summaries are published [00:08:38].
Today’s agentic architecture is “semi-agentic” due to a lack of full trust in autonomous systems [00:08:57]. Essential guardrails, such as prohibiting financial advice or ensuring factuality, are hard-coded and non-optional components of any agent [00:09:09].
Two Aspects of Scaling
Scaling AI products involves two primary considerations:
1. Embracing Fragility and Building Resilience
Unlike traditional software APIs or even earlier machine learning models with predictable input/output distributions, LLMs and compositions of LLMs (agents) introduce a significant degree of stochasticity and error multiplication [00:11:18]. This leads to fragile behavior, making it difficult to predict outcomes [00:11:29].
Previously, with ML products like a news sentiment detector (built in 2009), the input and output spaces were well-defined, allowing for robust testing and monitoring. Even then, out-of-band communication with downstream consumers was necessary when model versions changed [00:12:41].
For agentic architectures, the goal is to make daily improvements to agents, moving away from slow, batch-regression-test-based release cycles [00:13:03]. When agents are composed, an error in one component (e.g., misinterpreting a query from “quarterly” to “monthly” data) can lead to compounding errors downstream [00:14:16].
To address this, the strategy is not to rely on upstream systems being perfectly accurate, but to factor in their fragility and continuous evolution [00:14:23]. Implementing independent safety checks within each agent, even across internal teams, allows for faster evolution of individual agents without complex handshake signals and sign-offs from all downstream callers [00:14:58]. This resilience enables faster iteration and deployment.
2. Evolving Organizational Structure
The organizational structure should adapt to the demands of building AI agents, moving beyond traditional machine learning team structures [00:15:38]. Key questions arise:
- How many agents to build? [00:16:06]
- What should each agent do? [00:16:08]
- Should agents have overlapping functionality? [00:16:11]
Initially, when product design is unclear and fast iteration is needed, a vertically aligned team structure (where one team handles a full product/agent) is beneficial [00:16:46]. This allows for rapid prototyping, shared code, data, and models [00:17:01].
As understanding of individual products or agents matures—their use cases, strengths, and weaknesses—the organization can transition to a more horizontally aligned structure [00:17:18]. This is when optimizations like performance increases, cost reductions, improved testability, and transparency become priorities [00:17:29]. For example, guardrails (e.g., preventing financial advice) are implemented horizontally across all products, avoiding redundant effort across 50 different teams [00:17:41]. It’s crucial for an organization to determine the right time to create these horizontal functions and break down monolithic agents into smaller, more manageable pieces [00:18:02].
For a research analyst agent, this factorization means separate agents for understanding user queries and session context, figuring out necessary information, and generating answers with strict rigor [00:18:24]. Non-optional guardrails are called at multiple points, reflecting the “semi-agentic” approach [00:18:50].