From: aidotengineer

The concept of scaffolding AI agents for scalability focuses on architecting systems that improve with increased computational power, rather than relying on rigid, pre-coded logic [02:30:00]. This approach leverages the exponential improvements in large language models (LLMs) to create more general and adaptable AI solutions [01:16:00].

Speaker Background and Early Challenges [00:00:17]

The speaker, head of AI at Ramp, has been working with LLMs for four years, starting before the widespread adoption of models like ChatGPT [00:00:20]. In the early days, attempts to build AI agents for customer support using models like GPT-2 and BT were frustrating due to their limited intelligence, small context windows, and poor reasoning abilities [00:00:36]. This necessitated writing extensive “scaffolding” code around these models to achieve even somewhat reliable performance [00:00:58]. As models became smarter, much of this supporting code could be deleted, revealing patterns for building agents that scale with increasing intelligence [01:07:00]. The speaker also developed Jsonformer, an early structured extraction library, which was another form of scaffolding to force models to output JSON when they were not inherently capable [01:40:00].

The Bit or Lesson: Scaling with Compute [02:26:00]

The core idea is that systems designed to scale with compute consistently outperform those that are rigid and fixed [02:30:00]. This is because exponential trends, like the growth in LLM intelligence, are rare and offer a “free pass” for improvement without direct engineering effort [03:01:00].

Historically, in fields like chess, Go, computer vision, and Atari games, extensive human-engineered systems with clever abstractions and synthesized human reasoning were initially dominant [03:17:00]. However, when compute could be scaled, general methods, particularly those involving extensive search, consistently won out [03:45:00].

AI Agents at Ramp: The Switching Report Example [03:58:00]

Ramp, a finance platform, uses AI to automate various financial tasks [04:06:00]. One example is the “switching report” agent, designed to parse arbitrary CSV transaction schemas from third-party card providers [04:38:00]. The goal is to onboard users by helping them transfer existing transactions onto the Ramp platform [04:51:00].

Three approaches to building this agent were discussed:

  1. Manual Code (Rigid System):

    • Manually writing code for the 50 most common third-party card vendors [05:31:00].
    • This approach is simple and works, but requires significant upfront work and maintenance if formats change [05:41:00].
    • Depicted as purely “classical compute” [08:24:00].
  2. Constrained Agent (Classical Calls Fuzzy):

    • Introduces LLMs to classify CSV columns (e.g., date, transaction amount, merchant name) and map them to a desired schema [06:01:00].
    • Most compute still runs in classical code, with some fuzzy LLM compute for classification [06:47:00].
    • This is depicted as a “constrained agent” where classical code calls into the fuzzy LLM [08:44:00].
  3. LLM-Driven Agent (Fuzzy Calls Classical):

    • The LLM is given direct access to the CSV, a code interpreter (e.g., Python with Pandas), and the ability to view parts of the CSV (head, tail) [06:59:00].
    • The LLM is tasked with producing a CSV in a specific format and is provided with unit tests/verifiers to check its output [07:17:00].
    • While running once might not work, running this approach 50 times in parallel significantly increases success and generalizes across many formats [07:28:00].
    • Despite using “10,000 times more compute,” the cost is likely less than a dollar, which is negligible compared to the value of a successful transaction switch or the cost of engineer time [07:40:00].
    • This approach flips the control: the LLM decides when to execute classical code [08:52:00]. Most of the compute is “fuzzy” (within the LLM) [09:06:00].
    • Ramp is increasingly moving towards this third approach because the “blue arrows” (LLM capabilities) will continue to improve without internal effort, directly benefiting the company [09:58:00].

The Future of Backend Systems: LLM as the Backend [10:39:00]

Traditional web applications involve a frontend making requests to a backend, which then interacts with a database [10:47:00]. LLMs might assist engineers in writing this code, but the deployed system relies on classical compute [11:20:00].

A proposed future model suggests the backend is the LLM itself [11:38:00]. This LLM has access to tools like a code interpreter, can make network requests, and interact with a database [11:46:00].

An example of this is a mail client where the LLM simulates the Gmail client, renders UI in markdown, and processes user interactions [12:00:00]. When a user clicks an email, the LLM receives this information and decides how to render the next page or perform actions like deleting an email [14:45:00]. While currently slow, this demonstrates the potential for software to evolve dramatically due to exponential trends in LLM capabilities [15:34:00].