Agentic architectures for generative AI

Introduction to Agentic Architectures

The term “agentic landscape” describes the current state of AI development, particularly with the advent of Large Language Models (LLMs) [00:00:32]. In the context of building agentic enterprise applications, there’s a distinction between “tools” and “agents” based on cognitive architectures for language agents [03:17:10]:

Tool: Represents the simpler, less autonomous end of the spectrum [03:20:00].
Agent: Characterized by being more autonomous, possessing memory, and having the ability to evolve [03:30:00].

Bloomberg’s Journey into Generative AI

Bloomberg, a fintech company with a long history of investing in AI (almost 15-16 years), began building its own LLM in 2022 [00:00:44]. This effort involved significant work on data sets, evaluation, and performance optimization [00:57:00]. However, with the rise of open-weight and open-source communities, Bloomberg strategically pivoted to building on top of existing LLM capabilities [01:06:00].

Bloomberg’s AI efforts are organized as a special group within the Global Head of Engineering [01:42:00]. This group works closely with data counterparts, product teams, and the CTO in cross-functional settings, comprising around 400 people across 50 teams in London, New York, Princeton, and Toronto [01:49:00]. They have been building products using generative AI, starting with more agentic tools, for 12 to 16 months, addressing numerous challenges in AI architecture design [02:12:00].

Core Principles for Financial Applications

For a fintech company like Bloomberg, certain product aspects are non-negotiable [06:15:00]. These include:

Precision [06:25:00]
Comprehensiveness [06:25:00]
Speed and throughput [06:27:00]
Availability [06:27:00]
Protecting contributor and client data [06:30:00]
Transparency [06:34:00]

These principles ground the challenges faced when using AI to build agentic systems, requiring extensive work on remediation workflows and circuit breakers [08:24:00]. Errors in public-facing summaries, for instance, have an outsized impact [08:34:00]. Continuous monitoring and CI/CD (Continuous Integration/Continuous Delivery) are essential for maintaining and improving accuracy [08:38:00].

Agentic Architectures in Practice: The Research Analyst Example

Bloomberg’s clients in finance are diverse, including research analysts, portfolio managers, sales, trading, and risk managers [03:58:00]. Research analysts, who are experts in specific areas like AI or semiconductors, perform tasks such as:

Search and discovery [05:40:00]
Summarization of unstructured data [05:42:00]
Working with structured data and analytics [05:49:00]
Communication to gather and disperse information [05:52:00]
Building models (data normalization, programming) [06:00:00]

Bloomberg generates and accumulates vast amounts of data, both unstructured (news, research, documents) and structured (reference and market data). Every day, they process 400 billion ticks of structured data, over a billion unstructured messages, and millions of well-written documents, with over 40 years of history [04:17:00].

A key product developed in 2023 for research analysts helps process scheduled quarterly earning calls from public companies [06:54:00]. These calls include executive presentations and Q&A segments [07:13:00]. Given that many such calls occur daily during earning season, analysts need to stay informed [07:17:00]. Transcripts are generated using AI, and Bloomberg identifies questions of interest for specific sectors to provide answers to analysts, allowing them to quickly assess whether a deeper dive is needed [07:30:00].

The performance of these products “out of the box” was not sufficient in terms of precision, accuracy, and factuality [08:11:00]. This necessitated significant investment in MLOps to create remediation workflows and circuit breakers. Since these summaries are published, errors have a disproportionate impact [08:34:00].

Semi-Agentic Architecture

Bloomberg’s current product architecture for generative AI is “semi-agentic” [08:57:00]. This means some parts are autonomous, while others are not, due to a lack of full trust in complete autonomy [09:03:00]. Guardrails are a classic example of non-autonomous, mandatory components [09:09:00]. For instance, Bloomberg does not offer financial advice, so any query like “should I invest in…” must be caught by a guardrail [09:12:00]. Factual accuracy is another non-negotiable guardrail [09:19:00].

Scaling Generative AI Workloads

Dealing with Fragility and Stochasticity

When building agents, especially those that need to evolve rapidly, the inherent stochasticity of LLMs and compositions of LLMs can lead to significant fragility [09:50:00]. Unlike traditional software APIs (e.g., matrix multiplication) with well-defined inputs, error codes, and performance guarantees [10:07:00], or even traditional machine learning APIs with somewhat known input/output distributions [10:41:00], LLMs introduce multiplying errors [11:22:00].

For example, a news sentiment product built in 2009 had a well-understood input distribution (news wires, language, editorial guidelines) and a simple output space [11:43:00]. Training data was built from scratch, allowing for robust test sets and clear risk assessment [12:21:00]. Despite these controls, out-of-band communication was still necessary to warn downstream consumers of model changes [12:38:00].

In agentic architectures, the goal is to make daily improvements without lengthy release cycles based on batch regression tests [13:02:00]. If an agent providing data (e.g., CPI for five quarters) makes a subtle error (e.g., fetching monthly instead of quarterly data), a downstream workflow relying on that data without exposing the raw table could easily miss the mistake [13:52:00].

The solution is not to rely on upstream systems being perfectly accurate, but to factor in their inherent fragility and evolution [14:23:00]. Building in safety checks and guardrails at multiple points within the system enables individual agents to evolve faster without requiring extensive handshakes and sign-offs from every downstream consumer [14:51:00]. This shift in mindset fosters resilience and allows for more rapid iteration [15:18:00].

Organizational Structure for Building Agents

Traditional machine learning often leads to a specific software factorization reflected in organizational structure [15:37:00]. However, when building new kinds of products with different tech stacks, this structure needs rethinking [15:57:00]. Key questions arise:

How many agents should be built? [16:06:00]
What should each agent do? [16:08:00]
Should agents have overlapping functionality? [16:11:00]

Initially, when product design is uncertain and fast iteration is needed, a “collapsed” organizational structure with vertically aligned teams is beneficial [16:46:00]. This allows teams to figure things out, share code, data, and models, and iterate quickly [16:58:00].

As understanding of a product or agent matures, and its use cases become clear, the organization can transition towards more horizontal alignment [17:07:00]. This enables optimization for performance, cost reduction, testability, and transparency [17:29:00]. For example, guardrails for preventing financial advice are a horizontal concern that shouldn’t be re-figured out by every team [17:41:00]. Determining the right time to introduce horizontals and break down monolithic agents into smaller, specialized pieces is crucial [18:04:00].

Example Research Agent Architecture

Today, a research agent’s architecture might involve distinct agents for:

User Intent Understanding: Deeply understanding the user’s query and session context to determine necessary information [18:28:00]. This is factored out as its own agent [18:39:00].
Answer Generation: With rigorous standards for well-formed answers [18:43:00]. This is also factored out [18:47:00].
Guardrails: Non-optional components called at multiple points to ensure adherence to principles like not offering financial advice [18:50:00].

These systems build upon years of traditional and modern data wrangling techniques, evolving from sparse to dense and hybrid indices [18:59:00].

Tubegraph

Explorer

Table of Contents