Agentic architectures and systems design

Introduction to Agentic Systems at Bloomberg

Bloomberg began investing in AI approximately 15 to 16 years ago [00:00:44]. In late 2021, Large Language Models (LLMs) started to capture significant attention [00:00:36]. The company spent 2022 building its own large language model and published a paper on their learnings in 2023 [00:00:50]. However, with the rise of ChatGPT and the open-source community, Bloomberg pivoted its strategy to build on top of existing external models due to numerous use cases [00:01:06].

Bloomberg’s AI efforts are organized as a special group within engineering, collaborating closely with data, product, and CTO counterparts [00:01:42]. The organization consists of about 400 people across 50 teams in London, New York, Princeton, and Toronto [00:01:58]. For the past 12 to 16 months, they have been seriously building products using generative AI, focusing on more agentic tools [00:02:16].

Defining Agentic Systems: Agents vs. Tools

To clarify internal vocabulary, Bloomberg adopted definitions from a paper titled “Cognitive Architectures for Language Agents” [00:03:16].

Tool: Refers to the left-hand side of the spectrum in that paper, implying less autonomy [00:03:20].
Agent: Refers to the right-hand side of the spectrum, indicating a system that is more autonomous, possesses memory, and can evolve [00:03:28].

Building Agentic Products at Bloomberg

Bloomberg is a fintech company serving diverse financial clients, including research analysts, portfolio managers, traders, and more [00:03:51]. They handle vast amounts of structured and unstructured data, processing 400 billion ticks of structured data and over a billion unstructured messages daily, with more than 40 years of historical data [00:04:36].

Core principles for their products, regardless of AI usage, include precision, comprehensiveness, speed, throughput, availability, and protecting client data while ensuring transparency [00:06:23]. These non-negotiables heavily influence the challenges of building agents with current technologies [00:06:41].

Initial Approach: Earnings Call Summaries

In 2023, Bloomberg focused on helping research analysts by summarizing public company earnings call transcripts [00:06:54]. These calls include presentations and Q&A segments, and numerous calls happen daily during earnings season [00:07:13]. The goal was to automatically answer common questions of interest to analysts, enabling them to quickly assess whether a deeper dive is needed [00:07:44].

Out-of-the-box performance (precision, accuracy, factuality) was not great [00:08:11]. Significant MLOps work was required to build remediation workflows and circuit breakers [00:08:19]. Since these summaries are published and seen by all clients, errors have a significant impact, necessitating constant performance monitoring and remediation through CI/CD [00:08:34].

Current “Semi-Agentic” Architecture

Bloomberg’s current product architecture is “semi-agentic” [00:08:57]. This means some components are autonomous, while others are not, reflecting a lack of full trust in complete autonomy [00:09:03].

Key aspects of this architecture include:

Guardrails: These are non-optional and hard-coded components [00:09:11]. For example, Bloomberg does not offer financial advice, so any query asking for investment advice must be caught [00:09:12]. Factual accuracy is another mandatory guardrail [00:09:19]. These checks must be performed at multiple points within the system [00:18:54].

Scaling Agentic Architectures

Scaling agents involves two primary aspects: addressing fragility and optimizing organizational structure.

Addressing Fragility and Compounding Errors

When building agents, especially those that are compositions of LLMs, errors multiply significantly, leading to fragile behavior [00:11:22].

Comparison with Traditional ML Systems

In traditional machine learning products (e.g., a news sentiment product built in 2009), the input and output distributions are generally well-understood [00:11:43]. For instance, news sentiment models know which news wires are monitored, the language, and editorial guidelines, allowing for clear input parameters and well-defined outputs [00:11:57]. This allows for establishing risk, monitoring performance, and providing heads-up communication to downstream consumers about model changes [00:12:31]. While there was still some stochasticity, it was manageable [00:11:12].

Need for Resilience and Guardrails

In contrast, with agentic architectures, the goal is to make daily improvements to agents, moving away from slow, batch regression test-based release cycles [00:13:03]. Downstream customers also make independent improvements, complicating coordination [00:13:15].

An example highlights the problem: an agent correctly understands a query for “US CPI for the last five quarters” and dispatches to a tool, but the tool fetches monthly data instead of quarterly, leading to a wrong answer [00:13:30]. If the raw data isn’t exposed, a research analyst might not catch the error [00:14:04].

“It is easier to not count on the upstream systems to be accurate but rather factor in that they will be fragile and they’ll be evolving and just do your own safety checks.” [00:14:22]

Building in such guardrails allows for faster iteration, as individual agents can evolve without constant handshaking or sign-offs from every downstream caller [00:14:51]. This promotes a more resilient mindset [00:15:23].

Organizational Structure (Arc Structure) for Scalability

Traditional machine learning organizations have a specific software factorization reflected in their organizational structure, which needs to be rethought when building agentic products [00:15:37]. Key questions arise: How many agents to build? What should each agent do? Should functionality overlap? [00:16:06]. It’s tempting to maintain existing software and organizational structures, but this can hinder progress [00:16:17].

Early Stage: Vertical Alignment for Fast Iteration

In the beginning, when product design is uncertain, it is more effective to collapse the organizational and software stacks [00:16:46]. Vertically aligned teams can iterate quickly, sharing code, data, and models to figure things out [00:16:50].

Later Stage: Horizontal Alignment for Optimization

Once the single product or agent’s use case is well understood, and its strengths and weaknesses are clear, the organization can shift [00:17:10]. When building many such agents, it’s beneficial to return to the foundations of good software and organizational design [00:17:20]. This includes optimizing for performance, cost reduction, testability, and transparency [00:17:29].

This leads to the adoption of horizontal components. For example, guardrails should be horizontal across all teams [00:17:39]. The organization must determine the right time to create horizontals and break down monolithic agents into smaller pieces [00:18:07].

Example: Research Analyst Agent Architecture

For the research analyst, the current architecture is semi-agentic [00:18:24]. It is factorized into distinct agents reflecting the organizational structure:

An agent deeply understands the user’s query and session context, determining the type of information needed [00:18:28].
Another agent is responsible for answer generation, adhering to rigorous standards for well-formed answers [00:18:41].
Non-optional guardrails are applied at multiple points, providing no autonomy in these critical areas [00:18:52].
These agents leverage years of traditional and modern data wrangling, including the use of sparse, dense, and hybrid indices [00:18:59].

Tubegraph

Explorer

Table of Contents