Different architectures for AI agents

From: aidotengineer

Building AI agents effectively relies on understanding various architectural patterns and leveraging AI primitives rather than relying on bloated frameworks [01:16:00]. Many production-ready AI agents are not built on top of AI frameworks [01:16:00], as frameworks can be slow, bloated, and filled with unnecessary abstractions [01:21:00]. Instead, the focus should be on building on top of AI primitives [01:31:00].

The Power of AI Primitives

Building and deploying and scaling AI agent remains a significant challenge [03:20:00]. AI agents represent a new way of writing code, changing how coding projects and SaaS are built [03:32:00]. Primitives have a “native ability of working really really well in production” [02:29:00], similar to how Amazon S3 provides a simple, low-level primitive for object storage that scales massively [02:34:00].

Instead of frameworks, the approach is to build small, reusable building blocks (primitives) that are useful across an entire stack [03:56:00].

Key AI Primitives

Several fundamental AI primitives are essential for building robust AI agents:

Memory: An “autonomous drag engine” [10:28:00] that acts as long-term storage for data, potentially terabytes, and often includes a vector store for search [05:24:00]. It enables searching through vast amounts of data when using an agent [12:23:00].
Threads: Used to store and manage the context or history of a conversation [04:01:00], acting like a scratchpad for relevant information [11:58:00].
Parser: Extracts context from various file types, such as converting PDFs to text [08:14:00].
Chunker: Splits extracted text into smaller pieces of context for similarity search [08:20:00].
Tools: Allow the agent to automatically call external services or APIs [11:33:00].
Workflow Engine: Specifically built for multi-step agent tasks [10:30:00].

Building AI agents with these predefined, highly scalable, and composable AI primitives can lead to serverless AI agents that automatically handle heavy lifting [05:13:00].

Common AI Agent Architectures

Eight different AI agent architectures built purely with AI primitives have been identified [06:06:00].

1. Augmented LLM

This is a common architecture where an agent receives input and generates output using an LLM [11:18:00]. It integrates several primitives:

LLM: The core language model for generation [11:30:00].
Tools: To connect to external services or call APIs [11:33:00].
Threads: To store conversation history and context [11:41:00].
Memory: For long-term storage of events or data, enabling searching of terabytes of information [12:19:00]. Almost any type of AI agent can be built using this architecture [12:44:00].

2. Prompt Chaining and Composition

This architecture involves using multiple agents that work together sequentially [13:08:00]. An initial agent creates an output, and based on that output, a decision is made to proceed with another agent [13:16:00].

Example: An agent identifies if an email is spam; if not, another agent drafts a response email [13:22:00]. This can involve agents for summarization, feature extraction, or marketing copy generation [13:35:00].

3. Agent Router / LLM Router

In this pattern, an LLM router agent decides which specialized agent should be called next [14:02:00]. This involves creating specialized agents for different tasks, each potentially using a different LLM [14:24:00].

Example:
- A Summary Agent (e.g., using Gemini) for summarizing text [14:28:00].
- A Reasoning Agent (e.g., using DeepSeek LLaMA 70B) for analysis and explanations [14:31:00].
- A Coding Agent (e.g., using Claude Sonnet) for writing code [14:36:00]. The routing agent determines the appropriate agent based on the input task [15:06:00].

4. Parallel Agents

This architecture allows a set of agents to run concurrently, typically using simple asynchronous patterns like Promise.all in JavaScript [16:47:00].

Example: Running sentiment analysis, summarization, and decision-making agents in parallel [17:00:00].

5. Agent Orchestrator Worker

This architecture involves an orchestrator agent that plans and creates subtasks for multiple worker agents [17:13:00]. The results from these worker agents are then synthesized by another agent [17:24:00]. This pattern resembles a deep research agent architecture [17:30:00].

Example: Writing a blog post on remote work benefits (productivity, work-life balance, environmental impact) [18:10:00].
1. An orchestrator agent generates subtasks: “write introduction,” “write section on productivity,” “write section on work-life balance,” “write section on environmental impact,” and “write conclusion” [18:21:00].
2. Multiple worker agents are assigned to complete these subtasks [18:44:00].
3. Finally, the results from all worker agents are synthesized into a single output [19:02:00].

6. Evaluator Optimizer

This architecture uses an LLM as a “judge” to evaluate responses generated by another agent [20:08:00]. The evaluator can accept the response or reject it with feedback [20:25:00], allowing for iterative refinement.

Example: Generating an eco-friendly product description [20:48:00].
1. A generator agent creates a product description.
2. An evaluator LLM (chosen for its expertise in the domain) provides specific feedback if the description doesn’t meet the criteria (e.g., missing the target audience of “eco-conscious millennials”) [20:56:00].
3. The generator uses this feedback to produce an improved iteration [21:32:00].

7. Memory-Based Agents

This is a common pattern where data is uploaded into a memory primitive, and an agent then retrieves and answers questions related to that data [21:50:00].

Example: “Chat with PDF” [00:26:00].
1. A memory (with a vector store) is created for storing PDF content [06:16:00].
2. PDF files are parsed and chunked into small pieces of context [08:14:00].
3. An AI agent uses this memory to retrieve relevant context and generate answers to user questions [08:57:00].

8. Complex Agent Architectures

More complex agents can also be built by combining these primitives:

Deep Researcher (Perplexity-like): Involves analyzing a query, performing a web search, consolidating results, and then creating a response [22:37:00].
Receipt Checker: Uses OCR (Optical Character Recognition) as a primitive to process images and extract information [23:16:00], potentially using multiple OCR models for robustness [23:39:00].
Image Chat: Leverages a vision-capable LLM (e.g., GPT-4o Vision) to analyze an image URL and answer questions about its content [24:18:00].

Conclusion

By understanding and utilizing these AI primitive patterns, developers can build approximately 80% of the most complex AI agents [22:24:00]. The rapidly evolving nature of AI means that new paradigms and LLMs emerge frequently [25:54:00], making it crucial to build AI agents on top of flexible AI primitives rather than being constrained by pre-built, potentially outdated, framework abstractions [25:58:00]. This approach ensures easier migration and adaptation as agentic workflows and LLM capabilities advance [19:51:00].

Tubegraph

Explorer

Table of Contents