From: aidotengineer
The development of AI agents has seen significant evolution from early, rudimentary models to more sophisticated systems. Early challenges primarily stemmed from the limited capabilities of the foundational AI models themselves, necessitating extensive manual engineering to achieve even basic reliability [00:00:58].
Initial Limitations of LLMs
When working with Large Language Models (LLMs) four years prior to the widespread adoption of ChatGPT, models like GPT-2 presented considerable challenges [00:00:20]:
- Lack of Intelligence: Early models were “frustratingly stupid” [00:00:51].
- Small Context Windows: They had limited memory for information within a single interaction [00:00:53].
- Poor Reasoning: Their ability to logically process information and respond intelligently was weak [00:00:55].
- Need for Scaffolding Code: To make these models work “somewhat reliably,” significant amounts of code had to be written around them [00:00:58]. An example of such scaffolding was “Jsonformer,” a structured extraction library created because models were too “stupid to up with Json” and needed to be forced into desired formats [00:01:40].
Evolution of AI Agents and Architectural Shifts
As AI models have grown smarter, the need for extensive scaffolding code has diminished, leading to the deletion of much of that initial engineering effort [00:01:07]. This has revealed patterns for building agents that scale with increasing intelligence [00:01:14].
A core idea driving this advancement is that “systems that scale with compute beat systems that don’t” [00:02:30]. This principle, akin to the “Bit or Lesson,” suggests harnessing rare exponential trends, such as the increasing power of AI models, rather than relying on rigid, fixed, or deterministic systems [00:02:43].
Case Study: Ramp’s CSV Switching Report Agent
Ramp, a finance platform utilizing AI across its product, offers a “switching report” agent designed to parse arbitrary CSV transaction files from various third-party card providers [00:03:59]. The problem involves converting diverse CSV schemas into a consistent internal format [00:05:18].
Three approaches to building this system illustrate the shift in AI architecture:
-
Manual/Rigid Code Approach:
- Involves manually writing code for the 50 most common third-party vendors [00:05:34].
- This approach is effective but requires manual updates when schemas change [00:05:53].
- It relies entirely on classical compute [00:08:39].
-
Constrained Agent (Hybrid) Approach:
- Introduces LLMs to classify columns (e.g., date, amount, merchant name) within a classical scripting flow [00:06:11].
- Most compute is classical, with some “fuzzy LLM land” for specific tasks [00:06:47].
- This is a step towards a more general system [00:06:52].
-
General Agent (LLM-Driven) Approach:
- The LLM is given the entire CSV, a code interpreter (like Pandas), access to the CSV’s head and tail, and a verifier/unit test to ensure the output format [00:07:01].
- Initially, running it once may not work, but running it “50 times in parallel” significantly improves its reliability and generalization across different formats [00:07:31].
- This approach uses “10,000 times more compute” than the first but is still cost-effective given the scarcity of engineer time [00:07:40].
- This represents a model where the LLM (fuzzy compute) largely dictates the flow, calling into classical tools when needed [00:08:54].
The trend observed at Ramp, moving towards the third approach, demonstrates how leveraging the exponential improvements made by large AI labs directly benefits companies without much internal effort [00:10:07].
The Future: LLM as the Backend
This shift suggests a future where the LLM itself acts as the backend, rather than merely being a tool for code generation [00:11:40]. In this model, the LLM has access to tools like code interpreters, can make network requests, and interact with databases [00:11:46].
An experimental LLM-driven email client exemplifies this vision [00:12:00]:
- When a user logs in, the Gmail token is sent to an LLM [00:14:03].
- The LLM simulates a Gmail client, having access to emails, the user’s token, and a code interpreter [00:14:14].
- It then renders the UI (e.g., as Markdown) based on what it deems reasonable for a Gmail homepage [00:14:26].
- User interactions, like clicking an email, are fed back to the LLM, which then decides the next page state and appropriate UI (e.g., fetching and displaying the email body) [00:14:45].
While such software is currently slow and “barely works” [00:15:34], the exponential trends in AI could lead to it becoming common in the future [00:15:56]. This highlights the ongoing challenges and innovations in developing AI agents, including coding agents and personal AI agents.