From: aidotengineer
The speaker, head of AI at Ramp, has been working with LLMs for four years, starting before the widespread adoption of models like ChatGPT [00:00:23]. Early efforts involved building what are now called AI agents for customer support, aiming to make chatbots smarter [00:00:36]. Initial experiences with models like GPT-2 were “frustratingly stupid” due to small context windows and poor reasoning capabilities, requiring extensive custom code to achieve even “somewhat reliably” working systems [00:00:48]. Over time, as models improved, much of this scaffolding code could be removed, revealing patterns in how to build scalable agents [00:01:07]. The speaker also developed Jsonformer
, a structured extraction library for JSON, as early models struggled with structured output [00:01:40].
The Bit or Lesson: Scaling with Compute
A core philosophy for AI implementation is that “systems that scale with compute beat systems that don’t” [00:02:30]. This idea is rooted in the rarity of exponential trends; when one is found, it should be leveraged [00:03:01]. Historically, in fields like chess, Go, computer vision, and Atari games, efforts to build rigid, deterministic systems with extensive, clever human-written code were eventually surpassed by general methods that scaled with increased computational search [00:03:17].
Ramp’s AI Strategies
Ramp is a finance platform that helps businesses manage expenses, payments, procurement, travel, and bookkeeping more efficiently [00:03:59]. The company extensively uses AI across its product to automate “boring stuff” for finance teams and employees, such as submitting expense reports, booking flights, and handling reimbursements [00:04:06]. This often involves interacting with other legacy systems [00:04:20].
Case Study: The “Switching Report” Agent
A practical example of AI integration at Ramp is the “Switching Report” agent, designed to parse transaction data from third-party card providers [00:04:38]. The challenge is that CSV files from different providers have arbitrary and varying schemas [00:04:42]. The goal is to ingest these CSVs into a format Ramp understands, facilitating customer onboarding [00:04:51].
Three approaches to building this system were discussed:
-
Classical/Rigid Approach:
- Involves manually writing code for the 50 most common third-party card vendors [00:05:32].
- This approach is effective but requires manual effort to understand each schema and maintenance if formats change [00:05:41].
- Predominantly uses “classical compute” (traditional programming) [00:08:24].
-
Hybrid Approach (Constrained Agent):
- Introduces LLMs to enhance the system [00:06:01].
- For each column in the CSV, an LLM classifies its type (e.g., date, transaction amount, merchant name) using semantic similarity [00:06:27].
- The classified columns are then mapped to Ramp’s desired schema [00:06:38].
- Most compute remains classical, with some “fuzzy LLM land” computation [00:06:47].
- Represents a transition where classical code calls into fuzzy LLM operations [00:08:44].
-
LLM-Centric Approach (General Agent):
- The CSV is directly given to an LLM with a “code interpreter” (e.g., Python/Pandas) [00:07:01].
- The LLM is instructed to output a CSV in a specific format and is given a unit test/verifier to check its work [00:07:17].
- Running this approach once typically fails [00:07:28].
- However, running it 50 times in parallel drastically increases the likelihood of success and generalization across various formats [00:07:31].
- This method uses significantly more compute (10,000 times more than the first approach) but is still cost-effective given the value of engineer time and successful transactions [00:07:43].
- In this setup, the LLM decides when to execute classical code (e.g., Python scripts) and most of the compute is “fuzzy” [00:08:54].
The speaker notes that Ramp is increasingly moving towards the third approach because the “blue arrows” (LLM capabilities) are continuously improved by major labs, providing direct benefits to the company without significant internal effort [00:09:58]. This aligns with leveraging existing infrastructure for AI integration and the integration of mCP with AI applications.
Future Vision: The LLM as the Backend
The traditional web application model involves a frontend sending requests to a backend, which interacts with a database and returns data, with AI typically used during code generation or by engineers [00:10:47].
The proposed future model suggests the backend is the LLM [00:11:40]. In this model, the LLM has direct access to tools like code interpreters, can make network requests, and interact with databases [00:11:46].
The speaker demonstrated a proof-of-concept email client built on this principle [00:12:00]. When logging in, the Gmail token is sent to an LLM, which simulates a Gmail client, accessing emails via the token and a code interpreter [00:14:01]. The LLM then renders the UI (e.g., markdown for the homepage) [00:14:26]. User interactions, like clicking on an email, are also passed directly to the LLM, which then decides how to render the next page or perform actions (e.g., delete an email) [00:14:45].
While this kind of software is currently slow and “barely works,” the speaker believes that given exponential trends in AI, it could take off in the future, prompting a shift in how software and backends are conceived [00:15:34].