From: aidotengineer
The application of Artificial Intelligence (AI), particularly Large Language Models (LLMs), is transforming business operations by enabling systems that can scale with computational power. This approach contrasts with traditional, rigid software development, offering significant advantages in adaptability and efficiency [02:30:16].
Evolution of AI Agents and LLMs
The speaker, head of AI at Ramp, has been working with LLMs for four years, noting that the field significantly accelerated with the release of ChatGPT [00:20:41]. Early attempts at building AI agents for customer support faced challenges with LLMs like GPT-2, which were “frustratingly stupid,” had small context windows, and lacked strong reasoning capabilities [00:48:07]. Developers had to write extensive code to make these models work reliably [00:58:12].
As models became smarter, much of this “scaffolding” code could be deleted, revealing patterns for building agents that scale with increased intelligence [01:06:05]. An example of such scaffolding was “Jsonformer,” a structured extraction library developed to force earlier, less capable models to output JSON in a desired format [01:40:07].
The “Bit or Lesson”: Scaling with Compute
The core idea presented is that “systems that scale with compute beat systems that don’t” [02:30:16]. This means building systems that inherently improve with more computational resources [02:51:00]. Exponential trends, like the improvement in AI models, are rare and should be leveraged [03:01:08].
Historically, in domains like chess, Go, computer vision, and Atari games, people initially built “rigid systems” by writing clever, highly abstracted software that tried to synthesize human reasoning [03:17:09]. While this approach might win with a fixed amount of compute, scaling the amount of search or computational power (the “general method”) consistently leads to superior performance [03:41:00].
AI in Ramp’s Business Platform
Ramp is a finance platform that helps businesses manage expenses, payments, procurement, travel, and bookkeeping [03:59:03]. The platform extensively uses AI to automate “boring stuff” for finance teams and employees, such as submitting expense reports, booking flights, and handling reimbursements [04:06:04]. Much of Ramp’s backend work involves interacting with other systems and assisting employees more quickly [04:18:00].
Case Study: The “Switching Report” Agent
A key application of AI at Ramp is the “switching report” agent [04:37:07]. This simple agent’s task is to process CSV files of arbitrary format, typically from third-party card providers, and convert them into a standardized format for Ramp’s platform [04:40:07]. This allows Ramp to help businesses migrate their transactions smoothly onto the platform [04:53:07].
Three approaches for developing this agent illustrate the concept of scaling with compute:
-
Approach 1: Manual Code (Rigid System)
- This involved manually writing code for the 50 most common third-party card vendors [05:32:00].
- While functional, it required significant effort to understand each vendor’s schema and was brittle, breaking if a vendor changed their format [05:41:00].
-
Approach 2: Hybrid LLM + Classical Scripting (Constrained Agent)
- This approach introduced LLMs to classify columns (e.g., date, transaction amount, merchant name) within the incoming CSVs [06:01:00].
- Most of the compute still ran in traditional scripting, with a portion running in “fuzzy LLM land” for classification [06:47:00]. This resulted in a more general system [06:09:00].
-
Approach 3: LLM-Driven Execution (Fuzzy Land Centric)
- This radical approach involved giving the LLM the entire CSV and instructing it to produce a CSV in a specific target format [07:01:00].
- The LLM was provided with a code interpreter (e.g., Python with Pandas), allowing it to inspect the CSV data and write its own code for transformation [07:03:00].
- It also had a unit test or verifier to confirm if its output was correct [07:23:00].
- Initially, a single run of this approach didn’t work well [07:28:00]. However, running it 50 times in parallel made it “very likely that it works really well and generalizes across a ton of different formats” [07:31:00].
- This approach required significantly more compute (estimated 10,000 times more) than the first [07:40:00]. However, the speaker argued that engineer time is the truly scarce resource [07:50:00]. The cost of compute (less than a dollar per transaction) is negligible compared to the cost of failed transactions or engineer time spent on manual coding [07:58:00]. This illustrates a system that scales with compute [07:56:00].
Generalizing AI Agent Architectures
The three approaches for the switching report agent can be generalized into broader patterns for enterprise AI implementations:
-
Type 1: Classical Compute Only
- Software relies entirely on manually written code, with no AI involvement [08:39:00].
-
Type 2: Classical Calling Fuzzy LLM (Constrained Agent)
- Regular programming languages call into LLM services (e.g., OpenAI APIs) for specific, “fuzzy” computations like classification or embeddings [09:51:00]. This is common today [09:42:00].
-
Type 3: Fuzzy LLM Calling Classical Tools (LLM as Backend)
- The LLM is the central orchestrator, with most of the compute being “fuzzy” [09:08:00]. The LLM decides when to call into classical tools (e.g., writing Python code to interact with a database or perform complex logic) [08:54:00].
- Ramp is increasingly moving towards this third approach because the continuous improvements from major AI labs directly enhance the company’s codebase without significant internal effort [09:58:00]. This exemplifies leveraging exponential trends [10:24:00].
The Future of Software: LLM as the Backend
The speaker proposed a radical future model where the LLM is the backend of a web application [11:38:00].
Traditional Web App Model
In a traditional web application (e.g., Gmail), a static file server sends JavaScript, HTML, and CSS to the browser [10:47:00]. The frontend renders the UI, and user interactions trigger requests to a backend, which then interacts with a database [11:06:00]. While AI tools might be used during code generation, the deployed application runs on classical compute [11:20:00].
Proposed Model: LLM as the Backend
In the proposed model, the LLM itself acts as the backend [11:40:00]. This LLM has access to tools like a code interpreter and can make network requests or access databases [11:46:00].
Demo: LLM-Powered Mail Client
A demonstration of a mail client built on this principle was provided [12:00:00]. When logging in, the Gmail token is sent to an LLM, initiating a chat session [14:03:00]. The LLM is instructed to simulate a Gmail client, given access to emails and a code interpreter, and renders the UI (in markdown format) [14:07:00].
User interactions (e.g., clicking on an email) are sent back to the LLM, which then uses this information to render the next page state, similar to a web framework [14:45:00]. The LLM determines the appropriate UI and available actions (e.g., mark as unread, delete) [15:10:00].
Although the demo was “very slow” [15:34:00] and such software “barely works today,” the speaker emphasized the potential impact of exponential trends, suggesting that this type of AI-driven software could become prevalent in the future [15:54:00]. This vision suggests a future where complex enterprise applications could be orchestrated primarily by intelligent models.