From: aidotengineer

Building products using generative AI has presented numerous challenges that require resolution to effectively scale AI agents in production [00:02:23]. Bloomberg, a company with a 15-year history of investing in AI, pivoted its strategy in 2023 to build on top of existing large language models (LLMs) due to the rapid advancements in the open-weight and open-source communities [00:01:10]. Their AI efforts are organized as a special group within engineering, collaborating closely with data, product, and CTO teams, comprising about 400 people across 50 teams in various global locations [00:01:36].

Defining Agents and Tools

Internally, Bloomberg adopted a clear vocabulary for ‘tools’ and ‘agents’ based on the “Cognitive Architectures for Language Agents” paper [00:03:12]:

  • Tool: Refers to the cognitive architectures for language agents [00:03:20].
  • Agent: Defined as more autonomous, possessing memory, and capable of evolving [00:03:30].

Bloomberg’s Context: Finance and Non-Negotiables

As a fintech company, Bloomberg serves a diverse range of clients in finance [00:03:51]. The company generates and accumulates vast amounts of structured and unstructured data, including 400 billion ticks of structured data and over a billion unstructured messages daily, with 40 years of historical data [00:04:17].

Non-negotiable aspects of Bloomberg’s products include [00:06:21]:

These principles ground the challenges faced when building AI agents in production using current technologies [00:06:41].

Initial Product Development: Earnings Call Summarization

In 2023, Bloomberg began developing a product to help research analysts stay informed about quarterly earnings calls [00:06:54]. The goal was to automatically answer common questions of interest to analysts from earnings call transcripts [00:07:45].

Initial Challenges and Solutions

  • Performance: Out-of-the-box performance regarding precision, accuracy, and factuality was not satisfactory [00:08:11].
  • Remediation: Significant MLOps work was required to build remediation workflows and circuit breakers [00:08:19]. This was critical because published summaries have an outsized impact if erroneous [00:08:32].
  • Monitoring: Constant performance monitoring and CI/CD processes were implemented to ensure accuracy [00:08:38].

The current architecture for products is “semi-agentic,” meaning some pieces are autonomous while others are not, especially regarding guard rails like preventing financial advice or ensuring factuality [00:09:09].

Challenges in Scaling Generative AI Agents

1. Fragility of Compositions of LLMs

When using LLMs, especially as compositions in agents, errors can multiply, leading to fragile behavior [00:11:18].

  • Increased Stochasticity: Unlike traditional software APIs (e.g., matrix multiplication) or even early machine learning models (e.g., news sentiment APIs), LLMs introduce more stochasticity; it’s harder to predict the exact output or if it will work [00:10:49].
  • Compounding Errors: In an agentic workflow, a small error from one component (e.g., a missed character leading to monthly instead of quarterly data) can compound downstream, making it difficult to catch if only the final answer is displayed [00:13:54].
  • Rapid Evolution vs. Release Cycles: The desire for agents to improve daily clashes with traditional batch regression test-based release cycles, especially when many downstream consumers rely on the outputs [00:13:03].

Solution: Resilience Through Downstream Safety Checks

Instead of solely relying on upstream systems for accuracy, it’s crucial to factor in that they will be fragile and evolving [00:14:23].

  • Implement Downstream Checks: Build in safety checks at the downstream level [00:14:30]. This allows individual agents to evolve faster without requiring extensive handshake signals or sign-offs from every downstream caller [00:14:58].
  • Resilient Mindset: A resilient mindset allows for faster iteration and independent evolution of individual agents [00:15:21].

2. Organizational Structure (Arc Structure)

Traditional machine learning and software development often have a specific factorization of software reflected in their organizational structure [00:15:38]. With LLMs and agents, this needs rethinking.

Solution: Adapting Organizational Structure

  • Early Stages (Vertical Alignment): In the beginning, when product design is unclear and fast iteration is needed, it’s easier to have vertically aligned teams (e.g., collapsed software stacks and organizational units) that can build, iterate rapidly, and share code, data, and models [00:16:46].
  • Mature Stages (Horizontal Alignment): Once a single product or agent’s use, strengths, and weaknesses are well understood, and many agents are being built, the organization can transition to more foundational software practices [00:17:10]. This includes creating horizontal teams for aspects like guard rails (e.g., preventing financial advice) to optimize performance, reduce cost, increase testability, and transparency [00:17:26]. This also involves breaking down monolithic agents into smaller, more manageable pieces [00:18:12].

Semi-Agentic Architecture Example

For a research agent, Bloomberg’s current architecture features [00:18:24]:

  • An agent dedicated to understanding user queries and session context, then determining what information is needed [00:18:28].
  • A separate agent for answer generation with strict guidelines for well-formed answers [00:18:41].
  • Non-optional guard rails that must be called at multiple points, reflecting the “semi-agentic” nature where not all components are fully autonomous [00:18:50].
  • Reliance on years of traditional and modern data wrangling and indexing techniques [00:18:59].

By proactively addressing the challenges in building reliable AI agents through resilient design and adaptive organizational structures, companies can effectively scale their AI agent capabilities.