From: aidotengineer

The landscape of AI engineering is rapidly evolving, with a significant shift away from traditional AI frameworks towards the use of foundational AI primitives. This transition is redefining how developers approach building and deploying AI agents, emphasizing agility, scalability, and direct control over core components [00:01:13].

Current State of AI Agents

Many production-ready AI agents, such as Perplexity, Cursor, and Chai, are not built on top of conventional AI frameworks [00:01:01]. Frameworks are often criticized for being bloated, slow, and filled with unnecessary abstractions [00:01:21]. Instead, successful agents leverage AI primitives, which offer a more direct and efficient approach to development [00:01:31].

“All these AI agents in production are actually not built on top of any AI frameworks because, well, frameworks do not really add that much value. They’re bloated. They move super slowly and they they’re filled with these abstraction that nobody really needs. Instead, you should be building on top of AI primitives.” [00:01:13]

This shift is compared to the utility of Amazon S3, a simple, low-level primitive for data storage that scales massively without requiring an object storage framework [00:02:29].

AI Agents as a New Way of Coding

AI agents represent a fundamental change in how code is written and projects are built [00:03:32]. The complexity of building and deploying scalable AI agents remains a significant challenge [00:03:20]. This new paradigm is too significant to be constrained by rigid frameworks; instead, it calls for small, reusable building blocks [00:03:48].

The Rise of the AI Engineer

A prevailing belief is that most engineers will transition into AI engineers [00:04:17]. This trend is already visible with full-stack, web, and front-end developers rapidly adopting AI engineering roles, building products with LLMs and vector stores [00:04:22]. DevOps and ML engineers are also increasingly shipping AI-powered products [00:04:38].

Improving the AI Engineering Experience

The goal in this evolving space is to provide the fastest possible way to build production-ready AI agents [00:04:45]. This involves moving away from the “painful way” of using frameworks with obscure, hard-to-debug abstractions and deployment challenges [00:04:57].

Instead, the focus is on building atop predefined, highly scalable, and composable AI primitives [00:05:13].

Essential AI Primitives

Several core primitives are vital for building robust AI agents:

  • Memory: An autonomous RAG (Retrieval Augmented Generation) engine that can store and scale with terabytes of data via a vector store [00:05:24].
  • Threads: For storing and managing context and conversation history within an agent [00:04:01].
  • Parser: To extract context from various data formats, such as converting PDFs to text [00:08:14].
  • Chunker: To split extracted content into smaller, manageable pieces for similarity search [00:08:20].
  • Tools Infrastructure: Allows agents to connect to external APIs and services [00:11:33].
  • Workflow Engine: Purpose-built for multi-step agent processes [00:10:30].

Building with these primitives allows for the creation of serverless AI agents that automatically handle heavy lifting and scaling [00:05:37].

Common AI Agent Architectures Built with Primitives

Augmented LLM

This is a foundational architecture where an agent, powered by an LLM, processes input and generates output. It integrates:

  • Tools: For calling external APIs [00:11:33].
  • Threads: For storing conversational context and scratchpad information [00:11:41].
  • Memory: For long-term data storage and retrieval, often requiring a vector store [00:12:17].

Prompt Chaining and Composition

This architecture involves multiple AI agents working together in sequence. An input is processed by one agent, and its output determines the next action, potentially triggering another agent [00:13:08]. Examples include an agent identifying spam emails, then another drafting a response [00:13:22].

Agent Router (LLM Router)

An LLM-based router decides which specialized agent should be called next based on the task [00:14:02]. This allows for specialized agents (e.g., summary, reasoning, coding agents), each potentially optimized with a different LLM (e.g., Gemini for summary, Llama 70B for reasoning, Claude for coding) [00:14:24]. The router’s job is to respond with valid JSON indicating the appropriate agent [00:15:14].

Parallel Agent Execution

For tasks where multiple agents can operate independently, they can be run in parallel using simple programming constructs like Promise.all in JavaScript [00:16:50]. Examples include sentiment analysis, summarization, and decision-making agents [00:17:00].

Agent Orchestrator Worker

This sophisticated architecture involves an orchestrator agent that plans and creates subtasks, which are then assigned to multiple worker agents. The results from worker agents are then synthesized by another agent [00:17:13]. This mimics a “deep research agent” architecture [00:17:27].

Evaluator Optimizer

An LLM (acting as a generator) produces a response (e.g., a marketing copy), which is then evaluated by another LLM (acting as a judge). The judge either accepts or rejects the output, providing specific feedback for improvement [00:20:08]. This feedback loop allows for iterative refinement of the generated content [00:21:08].

Implications for the Future of AI in Coding

The reliance on AI primitives over frameworks ensures adaptability in a fast-moving AI landscape [00:25:51]. New paradigms, LLMs, and problem-solving techniques emerge frequently [00:25:54]. Building with primitives avoids being “stuck” with an abstraction layer that might hinder migration when LLMs become more sophisticated in handling agentic workflows [00:19:44]. This approach enables rapid development and deployment of scalable AI agents [00:26:39].

State of AI Engineering Research

Further insights into building agents, the types of primitives used, and LLM requirements across industries can be found through dedicated research efforts [00:10:50].