Role of AI in research and data analytics

From: aidotengineer

Bloomberg, a company with over 15 years of investment in AI, has significantly leveraged artificial intelligence, particularly large language models (LLMs) and agents, to enhance its research and data analytics capabilities. The company’s efforts began with building its own large language model in 2022, learning about data organization and evaluation, before pivoting to build on top of open-source and open-weight models as the landscape evolved [00:00:44].

Bloomberg’s Data Landscape

Bloomberg operates as a large data organization, generating and accumulating vast amounts of both unstructured and structured data [00:01:49]. Daily, this includes:

400 billion ticks of structured data [00:04:36]
Over 1 billion unstructured messages [00:04:40]
Millions of well-written documents, including news [00:04:44]

This data spans over 40 years of history, providing a massive information base for their financial clients [00:04:48].

The Research Analyst Archetype

AI efforts at Bloomberg are often focused on specific user archetypes, such as the research analyst in finance [00:05:12]. These analysts are experts in particular areas (e.g., AI, semiconductors, electric vehicles) and perform diverse activities daily [00:05:23]:

Search and Discovery & Summarization: Working extensively with unstructured data [00:05:40].
Data and Analytics: Engaging with structured data [00:05:46].
Communication: Dispersing and gathering information with colleagues [00:05:52].
Model Building: Normalizing data, programming, and generating models [00:06:00].

Early AI Products: Earnings Call Summaries

In 2023, Bloomberg identified an opportunity to use AI to assist research analysts with scheduled quarterly earnings calls [00:06:51]. With many calls happening daily during earnings season, analysts need to stay informed [00:07:21]. AI is used to:

Generate transcripts of these calls [00:07:30].
Answer sector-specific questions for analysts based on the transcripts, providing quick insights on company health and future outlook [00:07:38].

Challenges and Principles in AI Product Development

Building AI products at Bloomberg involves adhering to non-negotiable principles due to the nature of finance [00:06:15]:

Precision, Comprehensiveness, Speed, Throughput, Availability [00:06:25]
Data Protection: Safeguarding contributor and client data [00:06:30].
Transparency: Ensuring clarity throughout the process [00:06:34].

These principles mean that raw AI performance “out of the box was not great” in terms of precision, accuracy, and factuality [00:08:11]. Significant MLOps work is required to build remediation workflows and circuit breakers, as errors have an outsized impact on publicly published summaries [00:08:21]. Continuous monitoring and remediation improve accuracy [00:08:38].

Bloomberg’s current agentic architecture is “semi-agentic” because there isn’t full trust for complete autonomy [00:09:03]. Guard rails are a classic example, ensuring that the system does not offer financial advice or produce non-factual content [00:09:11].

Scaling AI Solutions

Dealing with Fragility and Stochasticity

Scaling AI in research and data analytics involves addressing the fragility of LLM-based agents, which are compositions of LLMs where errors can multiply [00:11:18]. Unlike traditional software with well-documented APIs or even early ML models with predictable input/output distributions, LLMs introduce more stochasticity [00:10:47].

“It is easier to not count on the upstream systems to be accurate but rather factor in that they will be fragile and they’ll be evolving and just do your own safety checks.” [00:14:23]

This means that downstream systems and agents must build in their own safety checks rather than relying solely on upstream accuracy [00:14:23]. This approach allows individual agents to evolve faster without extensive “handshake signals” or sign-offs from every downstream caller [00:15:00]. For example, an agent tasked with fetching US CPI data for the last five quarters can misinterpret “Q” (quarter) as monthly data if a single character is missed, leading to incorrect results that are hard to catch without exposing the raw data [00:13:31].

Organizational Structure for AI Development

The organizational structure for building AI agents also needs to evolve [00:15:32]. Initially, when product design is uncertain and fast iteration is needed, vertically aligned teams that “collapse the software stack” are effective [00:16:46]. This allows for rapid iteration and sharing of code, data, and models [00:17:01].

As understanding of a product or agent matures, and many agents are built, the organization can shift towards more horizontal structures to optimize for performance, cost reduction, testability, and transparency [00:17:20]. For example, guard rails, like preventing financial advice, are implemented horizontally across teams to avoid redundant efforts [00:17:39]. This involves breaking down monolithic agents into smaller, more specialized pieces [00:18:12].

Current Research Agent Architecture

Today, a research agent at Bloomberg demonstrates this evolved structure [00:18:24]:

An agent understands user queries and session context, figuring out what information is needed [00:18:28]. This is factored out as its own agent, reflected in the organizational structure [00:18:39].
Answer generation, with its rigor for well-formed answers, is also factored out as a separate component [00:18:43].
Non-optional guard rails are called at multiple points, ensuring no autonomy in critical safety aspects [00:18:52].
The system builds upon years of traditional and modern data manipulation techniques, including evolved indexing systems [00:18:59].

Tubegraph

Explorer

Table of Contents