Augmented LLM architectures

From: aidotengineer

Augmented LLM architectures represent a new way of building AI agents, focusing on the use of fundamental AI primitives rather than traditional, often bloated, AI frameworks [00:01:16]. The core idea is that production-ready AI agents, especially those used by millions of people, are typically not built on top of AI frameworks due to their bloat, slow movement, and unnecessary abstractions [00:01:01]. Instead, they leverage low-level, composable AI primitives that offer massive scalability and flexibility [00:02:29].

AI agents are fundamentally “a new way of writing code” [00:03:32], enabling engineers to transition into “AI engineers” and ship AI products rapidly [00:04:17].

Key AI Primitives

The development of robust AI agents relies on a set of specialized primitives, which are small, composable building blocks that can be integrated seamlessly:

Memory [00:05:24]: An autonomous retrieval engine that can store and scale with terabytes of data, typically incorporating a vector store for efficient similarity search [00:05:28]. It serves as the long-term memory for an agent [00:12:19].
Threads [00:04:06]: Used to store and manage the context or history of a conversation, acting like a scratchpad for short-term information relevant to a task [00:11:58].
Parser [00:10:41]: Extracts context from various file types, such as converting PDFs to text [00:08:14].
Chunker [00:10:44]: Splits extracted context into smaller, manageable pieces for use in similarity searches [00:08:20].
Tools [00:11:33]: Mechanisms that allow LLMs to automatically call external APIs or interact with other systems [00:11:33].
Workflow Engine [00:10:30]: A purpose-built component for managing multi-step agent processes [00:10:33].

Building with these primitives allows for the creation of serverless AI agents that automatically handle heavy lifting and scaling [00:05:44].

Common Augmented LLM Architectures

Several common AI agent architectures can be built using these primitives:

1. Basic Augmented LLM

This is the most common architecture, where an agent takes an input and generates an output using an LLM. It’s augmented with the ability to automatically call tools, access threads for conversational context, and utilize long-term memory (often with a vector store) for vast amounts of data [00:11:18]. This versatile architecture can form the basis of almost any AI agent [00:12:44].

2. Chat with Data (RAG Pattern)

One of the most common AI agents in production involves a chatbot interacting with specific data [00:00:12]. Example: Chat with PDF [00:00:26] This architecture involves:

Creating a memory (with a vector store) to store PDF content [00:06:19].
Utilizing a parser primitive to convert PDFs to text [00:08:14].
Employing a chunker primitive to split the text into smaller pieces for similarity search [00:08:20].
An LLM agent then uses this memory to answer questions related to the uploaded data [00:06:24].

3. Prompt Chaining and Composition

This architecture involves multiple agents working together in sequence [00:13:06]. An input leads to an output from one agent, which then informs the decision to proceed with another agent. Example: Processing an email for spam, then using another agent to draft a response if it’s not spam [00:13:22]. This can involve specialized agents like a summary agent, features agent, and marketing copy agent, all interacting through plain JavaScript/TypeScript code [00:13:35].

4. Agent Router (LLM Router)

In this setup, an agent, or an LLM router, decides which other specialized agent should be called next based on the input query [00:14:02]. Example: Routing agent for different tasks [00:14:26].

A summary agent (e.g., using Gemini) for summarizing text [00:14:28].
A reasoning agent (e.g., using DeepSeek Llama 70B) for analysis and explanations [00:14:31].
A coding agent (e.g., using Claude Sonnet) for code generation [00:14:36]. The routing agent determines the most appropriate specialized agent to handle the task [00:15:17]. This leverages domain-specific LLMs for specific tasks.

5. Parallel Agent Execution

This architecture is straightforward, running multiple agents simultaneously to process different aspects of an input [00:16:47]. In JavaScript, this can be achieved using Promise.all to run a set of agents concurrently [00:17:05]. Example: Sentiment analysis, summarization, and decision-making agents running in parallel [00:17:00].

6. Agent Orchestrator Worker

This is a sophisticated architecture, resembling a deep research agent system [00:17:27]. An orchestrator agent plans and creates subtasks, which are then distributed to multiple worker agents. The results from these worker agents are finally synthesized by another agent [00:17:13]. Example: Writing a blog post on remote work benefits [00:18:10].

The orchestrator agent generates subtasks like “write introduction,” “write section on productivity,” “write conclusion” [00:18:26].
Each subtask is assigned to a simple worker agent that completes it [00:18:04].
Finally, a synthesizing agent combines the outputs of all worker agents into a cohesive response [00:19:02]. This highlights automation capabilities for complex content generation.

7. Evaluator Optimizer (Self-Correction Loop)

This architecture uses an LLM as a “judge” to evaluate responses generated by another agent [00:20:11]. The judge agent either accepts the response or rejects it with specific feedback, which is then used by the generator to improve its next attempt [00:20:22]. This pattern is crucial for refining LLM outputs. Example: Generating an eco-friendly product description [00:20:48].

A generator agent creates the description [00:20:33].
An evaluator agent provides feedback (e.g., “missing the point with eco-conscious millennials”) [00:20:58].
The generator then uses this feedback to produce an improved version [00:21:32]. The evaluator should be built using the best possible LLM for the specific evaluation domain [00:21:16].

8. Deep Researcher (Perplexity-like)

This advanced architecture combines multiple steps and external tool integrations to perform comprehensive research [00:22:37]. Steps include:

Analyzing the query [00:22:41].
Performing web searches (e.g., using Exa) [00:22:44].
Consolidating the results [00:22:46].
Creating a synthesized response [00:22:48].

9. Receipt Checker (OCR Integration)

This agent combines an LLM with Optical Character Recognition (OCR) to extract information from images [00:23:16]. It can utilize external OCR primitives (e.g., from Mistral) to process images, then use a capable LLM (e.g., GPT-4V) to extract and interpret relevant data [00:23:18]. This demonstrates the power of integrating specialized capabilities that are not native to the LLM itself.

10. Image Chatbot (Vision LLM)

A simple agent that leverages a vision-capable LLM (e.g., GPT-4V) to analyze and chat about images provided via a URL [00:24:22]. The LLM processes the image and answers questions about its content [00:24:50].

Advantages of Primitives over Frameworks

Flexibility & Agility: In a fast-moving space where new paradigms and LLMs emerge weekly, primitives allow for quick adaptation and migration, unlike frameworks that can lead to being “stuck” with an abstraction layer [00:25:54].
Scalability: Primitives like Amazon S3 are designed for massive scale, offering inherent production readiness [00:02:29].
Reduced Bloat: Frameworks are often bloated with unnecessary abstractions, whereas primitives provide only the necessary low-level building blocks [00:01:21].
Debugging Ease: Complex frameworks with obscure abstractions can be hard to debug [00:05:03]. Primitives, being simpler, make debugging easier.
Direct Control: Building with primitives means writing plain code, without “magic” or steep learning curves associated with frameworks [00:15:35].
Accelerated Development: For AI engineers, using pre-built, highly scalable, and composable AI primitives significantly improves the speed of building production-ready AI agents [00:04:48].

By understanding and utilizing these AI primitive patterns, developers can build approximately 80% of the most complex AI agents in existence [00:22:24].

Tubegraph

Explorer

Table of Contents