Building and scaling large language models

From: aidotengineer

Bloomberg began seriously investing in large language models (LLMs) in late 2021, though the company has been investing in AI for nearly 15-16 years [00:00:36]. Initially, Bloomberg dedicated 2022 to building their own large language model, learning about data organization and evaluation during this process [00:00:50]. However, with the rise of ChatGPT and the open-weight and open-source communities, Bloomberg pivoted its strategy in 2023 to focus on building on top of existing available models [00:01:06].

Bloomberg’s AI Organization

Bloomberg’s AI efforts are structured as a specialized group within engineering, reporting to the Global Head of Engineering [00:01:38]. This group works in cross-functional settings with data counterparts, product teams, and the CTO’s office [00:01:49]. The AI team comprises approximately 400 people across 50 teams, located in London, New York, Princeton, and Toronto [00:01:58].

The company has been actively building products using generative AI for 12 to 16 months, with a significant focus on developing more agentic tools [00:02:16].

Defining Tools vs. Agents

For internal clarity, Bloomberg distinguishes between “tools” and “agents” based on the paper “Cognitive Architectures for Language Agents” [00:03:17].

Tool: Refers to the “left-hand side” of the spectrum, implying less autonomy [00:03:20].
Agent: Represents the “right-hand side” of the spectrum, characterized by greater autonomy, memory, and the ability to evolve [00:03:30].

Bloomberg’s Data Scale and Principles

As a FinTech company, Bloomberg deals with a massive scale of financial data:

They generate and accumulate both unstructured data (news, research, documents, slides) and structured data (reference data, market data) [00:04:17].
Daily, they receive 400 billion ticks of structured data, over a billion unstructured messages, and millions of written documents including news [00:04:36].
This data spans over 40 years of history [00:04:48].

Due to the nature of finance, certain product principles are non-negotiable:

Precision, Comprehensiveness, Speed, Throughput, Availability [00:06:25].
Protecting contributor and client data [00:06:30].
Transparency [00:06:36]. These principles must be maintained regardless of whether AI is used [00:06:38].

Scaling Agentic Architectures

When scaling LLM-based agents, two primary aspects are crucial:

1. Embracing Fragility with Robust Guardrails

Traditional software, like generalized matrix product APIs, is robust and well-documented [00:10:07]. Machine learning APIs, while generally well-intentioned in defining input/output distributions, introduce a degree of stochasticity [00:10:41]. However, with LLMs and their compositions into agents, errors multiply significantly, leading to fragile behavior [00:11:18].

For instance, a news sentiment product built in 2009 had predictable input (news wires, language, editorial guidelines) and output distributions, allowing for careful testing and deployment risk assessment [00:11:43]. Downstream consumers were notified of model changes and encouraged to re-test [00:12:41].

In contrast, modern agentic architectures require daily improvements, making traditional batch regression testing and extensive release cycles impractical [00:13:03]. Errors can compound, especially in downstream workflows where data may not be fully exposed [00:14:16].

The key to scaling is to factor in that upstream systems will be fragile and evolving, and implement independent safety checks and guardrails at each stage [00:14:23]. This allows individual agents to evolve faster without extensive coordination and sign-offs from every downstream caller [00:14:58].

For example, Bloomberg’s agentic products are “semi-agentic” because full autonomy is not yet trusted [00:08:57]. Essential guardrails, such as prohibiting financial advice or ensuring factuality, are hard-coded and mandatory [00:09:09].

2. Rethinking Organizational Structure

Traditional machine learning teams often have a software factorization reflected in their organizational structure [00:15:38]. However, when building with LLMs and new tech stacks, this structure needs to be re-evaluated [00:15:57].

Early Stages (Product Discovery): It’s beneficial to have vertically aligned, collapsed teams and software stacks to facilitate rapid iteration and product design discovery [00:16:46]. This fosters fast iteration and sharing of code, data, and models [00:17:01].
Later Stages (Optimization and Scale): Once the product design and agent use cases are understood, the organization can transition to a more horizontal structure [00:17:10]. This allows for optimization, performance improvement, cost reduction, increased testability, and transparency [00:17:29]. For instance, common guardrails like filtering financial advice queries should be horizontally managed across teams to avoid redundant effort [00:17:41]. This also enables breaking down monolithic agents into smaller, more manageable pieces [00:18:12].

Example: Research Analyst Agent

For a research analyst, a current agent architecture looks like this:

An agent deeply understands the user’s query and session context [00:18:28].
It determines the necessary information and dispatches to a tool, often with an NLP front end, to fetch data [00:18:34].
Answer generation has its own agent, with strict rules for well-formed answers [00:18:43].
Non-optional guardrails are called at multiple points, ensuring no autonomy where core principles are concerned [00:18:50].
The system builds upon years of traditional and modern data management techniques [00:18:59].

This approach reflects the understanding that to successfully scale and deploy AI agents, organizations must adapt their technical and structural mindsets to prioritize resilience and disciplined factorization.

Tubegraph

Explorer

Table of Contents