From: aidotengineer
This article explores the evolution and history of AI technology, particularly focusing on AI models and training methods, challenges in their development, and future directions in application architecture.
Early Experiences with LLMs
The speaker has been working with Large Language Models (LLMs) for four years, a significant duration given the rapid advancements in the field, especially since the emergence of ChatGPT [00:00:26]. Initially, the goal was to build what would now be called an AI agent company, focusing on customer support chatbots [00:00:34].
Early LLMs, such as GPT-2 and GPT-3, presented significant challenges in building AI applications [00:00:51]. They were described as “frustratingly stupid,” had small context windows, and lacked sophisticated reasoning capabilities [00:00:51]. This necessitated writing extensive code around these models to achieve even somewhat reliable performance [00:00:58]. As models became smarter, less of this scaffolding code was needed, revealing patterns for building agents that scale with increasing intelligence [00:01:06].
An example of early scaffolding was the JSONFormer library, designed for structured extraction when models were unable to reliably output JSON format [00:01:40].
The Bit or Lesson: Scaling with Compute
The core idea presented is that “systems that scale with compute beat systems that don’t” [00:02:30]. This means if a system can leverage more computational power to improve its performance without significant additional effort, it will outperform rigid, fixed, or deterministic systems [00:02:37]. This principle is considered an obvious conclusion from the “Bit or Lesson” [00:02:57].
Exponential trends, such as those seen in AI model capabilities, are rare, and when encountered, one should capitalize on them [00:03:01]. Historical examples supporting this include:
- Chess and Go: Early attempts involved writing extensive, clever software to synthesize human reasoning [00:03:17]. While effective with fixed compute, scaling out computational search (e.g., Monte Carlo Tree Search) ultimately led to general methods winning [00:03:45].
- Computer Vision and Atari Games: Similar patterns where generalized, compute-intensive approaches surpassed rigid, hand-coded systems [00:03:53].
Case Study: Ramp’s Switching Report Agent
Ramp, a finance platform, utilizes AI extensively to automate tasks like expense management, payments, and bookkeeping [00:04:06]. A specific case study involves the “switching report” agent, designed to parse arbitrary CSV transaction formats from third-party card providers [00:04:38]. The goal is to onboard users by helping them migrate transactions from other platforms [00:04:53].
Three approaches to building this agent were explored:
1. Manual/Deterministic Approach
This involved manually writing code for the 50 most common third-party card vendors [00:05:32].
- Pros: Works reliably for known formats.
- Cons: Requires significant manual effort to identify schemas and write specific parsing logic [00:05:47]. Prone to breaking if vendor formats change [00:05:53]. Does not scale to arbitrary formats.
2. Hybrid (Classical + LLM) Approach
This approach combined classical scripting with LLMs. Each CSV column would be classified (e.g., date, transaction amount, merchant name) using an LLM or embedding model for semantic similarity, and then mapped to a desired schema [00:06:11].
- Pros: More general than manual coding [00:06:09].
- Cons: Still primarily classical compute with some “fuzzy LLM land” integration [00:06:47].
3. LLM-Centric (Scaling with Compute) Approach
This radical approach involves giving the entire CSV to an LLM with a code interpreter (e.g., for Python/Pandas) [00:07:01]. The LLM is instructed to output a CSV in a specific format, with unit tests and a verifier to check its work [00:07:20].
- Initial Findings: Running it once often doesn’t work [00:07:28].
- Key Insight: Running it 50 times in parallel significantly increases the likelihood of success and allows it to generalize across a vast range of formats [00:07:31].
- Cost-Benefit: Although this approach uses 10,000 times more compute than the first approach, it still costs less than a dollar per transaction [00:07:58]. The true scarcity is engineer time, and a reliable, general system is more valuable than direct compute cost savings [00:07:51].
Generalizing AI Engineering Architectures
The three approaches can be generalized to AI application architectures:
- Classical-only: Software engineered without AI.
- Classical calls AI: Traditional programming languages make calls to AI models (e.g., OpenAI APIs) for “fuzzy compute” [00:09:48].
- AI calls Classical: The LLM itself decides when to use classical compute tools (e.g., running Python code), with most of the processing being “fuzzy” [00:09:54].
Ramp is increasingly moving towards the third approach because the continuous improvements made by major AI labs to their models (the “blue arrows” or “fuzzy compute”) directly benefit applications without additional internal effort [00:10:05]. This allows companies to “hitch the ride” on exponential AI trends [00:10:24].
Future Directions: LLM as the Backend
A bold future direction proposes a shift in web application architecture.
- Traditional Web App: Frontend (HTML/CSS/JS) makes requests to a classical backend, which interacts with a database [00:10:47]. Code generation tools might assist engineers, but the deployed system is purely classical [00:11:20].
- LLM as Backend: The LLM is the backend, performing execution [00:11:40]. It has access to tools like code interpreters, can make network requests, and interact with databases [00:11:46].
Experimental LLM-Powered Mail Client
An experimental mail client was developed based on this principle [00:12:00]. When a user logs in, their Gmail token is sent to an LLM, which simulates a Gmail client [00:14:03]. The LLM has access to the user’s emails and a code interpreter, and it renders the UI, for example, in Markdown [00:14:14].
When a user interacts with the UI (e.g., clicking on an email), the LLM receives this information and decides how to render the next page, potentially making further requests to retrieve email content or perform actions like deleting an email [00:14:45].
While currently slow and “barely works” due to the intense computational demands [00:15:34], this concept highlights a potential future direction for software, where exponential improvements in AI capabilities could make such systems viable and efficient [00:15:56].