Current state and future of AI agent frameworks

From: redpointai

The AI landscape, particularly concerning agent frameworks, is rapidly evolving, driven by the emergence of large language models (LLMs) [03:52:54]. LangChain, founded by Harrison Chase, has become a prominent framework for working with LLMs, facilitating the creation of AI applications [00:09:09].

Current State of AI Agents

Initially, there was an “explosion of interest” in AI agents, spearheaded by concepts like AutoGPT [21:46:00]. However, early autonomous, highly generalizable agents proved to be less practical for immediate use [21:56:00]. The focus has since shifted towards more focused agents that are “more practically ready today” [22:08:00].

Multi-agent frameworks, such as Autogen and Crew AI, have emerged, which are often described as controlled flows between specific prompts and tools [22:58:00]. This approach treats agents as a “state machine,” allowing for more control over transitions and behavior [23:14:00]. This state machine mental model helps developers enforce specific transition probabilities and define distinct states, making agents more palatable and reliable for production [23:14:00]. LangChain’s own LangGraph framework, for example, allows constructing agents as graphs or state machines [23:50:00].

Future of AI Agents and Applications

The future of AI agents in software development and future potential of autonomous AI agents in various fields is expected to involve:

More Complex Chatbots: Applications will likely evolve into more complex chatbots that operate as state machines, guiding users through different stages, such as a customer support bot with various debugging phases or an AI therapist [41:55:00].
Longer-Running Jobs: There will be an increase in applications designed for longer-running, non-instantaneous tasks, such as generating a first draft of a research report or a newsletter (e.g., GPT Researcher, GPT Newsletter) [42:19:00]. These applications require new user experience (UX) designs that accommodate delayed responses [42:48:00].
Personalization and Memory: A significant area of innovation is in personalization and memory for AI applications [53:24:00]. The goal is to develop applications that remember details about individual users and tailor experiences, potentially through techniques like Retrieval-Augmented Generation (RAG) or fine-tuning [53:51:00]. An example of a future application could be a journal app that remembers personal details and initiates conversations based on entries and past interactions [55:01:00].
Improved Multimodal Capabilities: While currently considered “overhyped” for precise knowledge work, multimodal models are expected to improve, particularly in areas like spatial awareness for tasks such as extraction from images [46:46:40].
Better Structured Extraction: The need for specific phrasing like “write this in JSON” will hopefully diminish as models become better at structured extraction [48:32:00].

Despite advancements, some core techniques are likely to remain. Retrieval, for instance, is expected to continue being essential [46:11:00]. The state machine model for agents is also seen as a “really helpful mental model” for developers approaching AI models and agents, suggesting it will persist even with more capable models [48:05:00].

LangChain’s Role and Evolution

LangChain positions itself as an orchestration layer for building LLM applications, connecting LLMs to external data and computation [04:03:00]. Its core focus areas include:

Retrieval: Functionality around chat, streaming, and data retrieval [04:24:00].
Agents: Frameworks for building agents, moving from general autonomous agents to more focused, controlled state machines [06:57:00].
Evaluation: Providing tools for testing and evaluating LLM applications [06:59:00].

LangChain has developed several products:

LangChain (Open Source): The primary framework for building LLM applications [03:58:00]. It has evolved to include more flexible, lower-level components like LangChain Expression Language and LangGraph, which facilitate building complex orchestration layers [30:47:00].
LangSmith: A separate SaaS platform for observability, testing, and evaluation of LLM applications [05:51:00]. It tracks multiple LLM calls, inputs, and outputs, aiding in debugging complex applications [08:06:00].
LangGraph: A framework specifically designed for constructing agents as graphs or state machines, providing more control over agent behavior [23:50:00].
LangServe: Launched to simplify the deployment of LangChain applications, wrapping them in a FastAPI backend and providing playgrounds for testing [36:36:00]. This tool also facilitates cross-functional collaboration by allowing non-technical team members to interact with and provide feedback on applications [38:50:00].
OpenGPTs: A project by LangChain that recreates the functionality of the GPT store experience in an open-source manner, enabling companies to build internal chatbot platforms connected to their own data and APIs [26:31:00].

A key challenge for LangChain is balancing building for the “here and now” with remaining “nimble and flexible” for future advancements, given the rapid pace of AI development [30:03:00]. The organization focuses on providing the most value by investing in areas where users face significant blockers, such as composability and observability [41:01:00].

Evaluation and Testing of AI Agents

Evaluation (eval) is a critical component for ensuring the reliability of LLM applications. Key challenges and practices in eval include:

Data Set Creation: Teams typically start with hand-labeled examples (around 20), then incorporate edge cases identified from production failures [10:52:00]. This process forces developers to clarify what the system should do and how it should handle edge cases [14:52:52].
LLM as a Judge: For complex tasks where traditional machine learning classification is difficult, LLMs are increasingly used to judge the quality of responses, though this method is not perfect and still requires human oversight [11:30:00].
Human-in-the-Loop: A significant manual component remains in evaluation, especially for understanding why models behave unexpectedly [11:41:00]. This manual review offers deeper insights into the system’s workings [13:32:00].
Aggregation and Frequency: Decisions must be made on how to aggregate metrics and how often to perform evaluations, as the process can be expensive and slow [12:07:00]. The goal is to reduce the manual component to enable more frequent testing, ideally in continuous integration (CI) pipelines [12:51:00].
Generalizability: Evaluation tools aim to be low-level, code-first, and framework-agnostic to support diverse use cases [18:04:00].

Advice for Startups Building in AI

For startups, the approach to AI models and agents and future of software development and AI should be pragmatic:

Build Now: Despite the rapid changes and perceived “hacks” in the space, it’s crucial to start building applications immediately [30:20:00].
Focus on Product-Market Fit (PMF): Prioritize building with powerful models like GPT-4 to achieve PMF, rather than prematurely optimizing for cost or latency [45:22:00]. The mantra “no GPUs before PMF” emphasizes this [45:42:00].
Leverage Few-Shot Prompting: This technique is currently “underhyped” but powerful for improving application performance, especially for structured output or complex instructions [49:17:00].
Embrace Open Source: Open source models are becoming increasingly ubiquitous and will enable personalized, local applications (e.g., desktop apps that run locally) [51:58:00].
Innovate on UX: A major area for innovation is in designing user experiences for AI applications, as how people want to interact with these systems is still being discovered [43:30:00]. An “AI-native spreadsheet” that spins up agents for each cell is an example of such a novel UX [43:53:00].

Implications of Autonomous AI Agents

The implications of autonomous AI agents are far-reaching. While truly autonomous agents are still developing, more controlled, state-machine-based agents are moving into production [23:40:00]. Industries are increasingly using AI internally, especially larger enterprises, for assistant-like platforms that allow employees to create chatbots with their own data and APIs, often with lower risk than consumer-facing products [26:05:00].

The overall future and current state of AI agents is characterized by rapid change and immense opportunity, with much value yet to be created [34:44:00]. While the concept of AGI is often discussed, the immediate focus is on building practical, reliable applications [22:13:00]. The current period might be more stable than the chaotic “super hectic” phase of previous months, allowing for more focus on improving documentation, use cases, and scaling [33:34:00]. The development of AI applications, especially in areas like AI in VR and personal agents, is analogous to the early days of the iPhone, where killer apps took time to emerge [1:03:00].

Tubegraph

Explorer

Table of Contents