From: aidotengineer

The speaker, head of AI at Ramp, has been working with Large Language Models (LLMs) for four years, starting before the widespread adoption of tools like ChatGPT [00:00:20]. Initially, the focus was on building an AI agent company for customer support, using early models like GPT-2 [00:00:36]. These early models were “frustratingly stupid,” had small context windows, and lacked smart reasoning, requiring significant code to achieve reliable performance [00:00:51]. As models improved, much of this supporting code became obsolete, revealing patterns for building scalable agents that can adapt to increasing intelligence [00:01:04]. The speaker also built Jsonformer, an early structured extraction library for models that struggled with JSON output [00:01:40].

The core idea presented is that systems which scale with compute outperform those that don’t [00:02:30]. This concept, derived from the “Bit or Lesson” (likely Bit and a half lesson), suggests that if a system can inherently leverage more computation to improve, it will ultimately win [00:02:51]. Exponential trends are rare, and when encountered, it’s beneficial to “hop on and take the free pass” [00:03:01].

Historical examples illustrating this principle include Chess, Go, computer vision, and Atari games [00:03:17]. While rigid, hand-coded systems might win with a fixed amount of compute by synthesizing human reasoning, scaling out search (i.e., using more compute) with a general method consistently leads to superior performance [00:03:41].

AI Integration at Ramp

Ramp is a finance platform that helps businesses manage expenses, payments, procurement, travel, and bookkeeping [00:03:59]. AI is extensively used across its product to automate routine tasks for finance teams and employees, such as submitting expense reports or booking flights [00:04:06]. This often involves interacting with legacy systems to streamline workflows [00:04:20].

Case Study: The Switching Report Agent

A specific example at Ramp is the “switching report” agent, which helps onboard new users by migrating transaction data from third-party card providers [00:04:38]. The challenge is parsing CSV files with arbitrary schemas into a format Ramp understands [00:04:42].

Three approaches for architecting this system were explored:

  1. Manual Code (Rigid System):

    • This approach involved manually writing code for the 50 most common third-party card vendors [00:05:32].
    • While functional, it requires manual effort to understand each schema and breaks if vendors change their format [00:05:47]. This is a purely classical compute approach [00:08:39].
  2. Constrained Agent (Hybrid System):

    • To achieve a more general system, LLMs were introduced into the classical scripting flow [00:06:08].
    • An embedding model would classify each column in the incoming CSV (e.g., date, transaction amount, merchant name) before mapping it to a desired schema [00:06:27].
    • Most compute remains in classical land, with some “fuzzy” LLM compute for classification [00:06:47]. This approach involves classical code calling into fuzzy LLM land [00:08:44].
  3. Full AI Agent (Fuzzy/Compute-Scalable System):

    • This approach involved giving the raw CSV directly to an LLM with access to a code interpreter (e.g., Python with pandas) [00:06:59].
    • The LLM could examine the CSV head/tail and was tasked with producing a specific output format, with a unit test/verifier [00:07:14].
    • Initially, this approach failed when run once, but when run 50 times in parallel, it worked very well and generalized across many formats [00:07:28].
    • Despite using “10,000 times more compute,” the cost (under a dollar per transaction) was negligible compared to the loss from a failed CSV [00:07:40].
    • The core idea is that engineer time is scarcer than compute [00:07:50]. In this model, the LLM decides when to execute classical code, with most compute being “fuzzy” [00:08:54].

Generalizing AI Backend Architectures

The speaker illustrates three common architecture patterns for backends:

  1. Pure Classical: All compute is rigid, deterministic, and coded [00:08:39].
  2. Hybrid (Classical-to-Fuzzy): Classical programming languages call into “fuzzy” LLM servers for some compute [00:09:48].
  3. AI-Driven (Fuzzy-to-Classical): The LLM is the orchestrator, and it decides when to break into classical code execution (e.g., using a code interpreter) [00:08:54].

The speaker argues that Ramp’s codebase is increasingly moving towards the third approach because the “blue arrows” (fuzzy LLM compute) will continuously improve without direct effort from internal teams, as major labs invest billions into making models better [00:09:58]. This allows companies to “hitch a ride” on exponential AI trends [00:10:24].

Future Vision: The LLM as the Backend

A more radical future model is proposed where the LLM is the backend itself, not just a code generation tool for engineers [00:11:38]. In this model, the LLM has access to tools like code interpreters, network requests, and databases [00:11:46].

As a demonstration, the speaker showcases a mail client built on this principle [00:12:00]. When a user logs in, the Gmail token is sent to an LLM, which is instructed to simulate a Gmail client [00:14:03]. The LLM has access to emails and a code interpreter and renders the UI (e.g., in Markdown) [00:14:16]. When a user clicks on an email, the LLM receives this interaction and decides how to render the next page, potentially making a GET request to fetch the email body [00:14:45]. The LLM also determines what actions are available, like marking an email unread or deleting it [00:15:17].

While this type of software “barely works today” and is slow due to the extensive compute involved [00:15:34], the speaker encourages thinking in this direction due to the potential of exponential AI trends [00:15:56]. The question remains if more software will adopt this AI-driven backend architecture [00:16:07].