From: aidotengineer
The central idea in future software architecture, particularly with the advent of AI, is that systems designed to scale with compute will outperform those that do not [00:02:30]. This principle suggests that rigid, fixed, and deterministic systems are less advantageous than those that can leverage increased computational power [00:02:43].
The “Bit or Lesson” Principle
The “Bit or Lesson” emphasizes that exponentials are rare, and when encountered, they should be embraced to gain a significant advantage [00:03:01]. Historically, in domains like chess, Go, computer vision, and Atari games, human-engineered systems with clever software and synthesized human reasoning could win if compute was fixed [00:03:20]. However, when the amount of search or compute is allowed to scale, the more general, less rigid methods consistently win [00:03:45].
Evolution of AI Agents at Ramp
The speaker, having worked on large language models (LLMs) for four years, initially focused on building what are now called AI agents for customer support, aiming to make chatbots smarter [00:00:20]. Early models like GPT-2 were frustratingly limited by small context windows and poor reasoning capabilities, requiring extensive code to achieve even reliable, albeit minimal, functionality [00:00:51]. As models became more intelligent, much of this scaffolding code could be deleted, revealing patterns for building agents that scale with intelligence [00:01:07]. An example of this is the JSONFormer library, built to force models to output valid JSON due to their early limitations [00:01:40].
Ramp, a finance platform, extensively uses AI to automate tasks for finance teams and employees, such as expense reports, flight bookings, and reimbursements [00:04:08]. This involves interacting with existing and legacy systems [00:04:20].
Case Study: The Switching Report Agent
A “switching report” agent at Ramp processes CSV files from third-party card providers, which have arbitrary schemas, to help users onboard transactions to Ramp [00:04:38]. The problem is parsing these diverse CSV formats into a standardized internal format [00:05:18].
-
Version 1: Manual/Deterministic Approach (No AI) This approach involved manually writing code for the 50 most common third-party card vendors [00:05:32]. While functional, it requires significant manual effort to understand schemas and maintain the code if formats change [00:05:47].
-
Version 2: Hybrid Approach (Classical with AI Components) To create a more general system, LLMs were introduced [00:06:01]. This version involved using classical scripting with calls to LLMs (e.g., OpenAI) or embedding models for semantic similarity [00:06:17]. Each CSV column would be classified (e.g., date, transaction amount, merchant name) using the LLM, then mapped to a desired schema [00:06:29]. In this architecture, most compute still ran in classical code, with some fuzzy LLM compute [00:06:47]. This approach represents enhancing existing systems with AI capabilities.
-
Version 3: AI-First Approach (Mostly Fuzzy with Classical Tools) This approach directly gives the CSV to the LLM, treating it as a code interpreter [00:06:58]. The LLM has access to tools like Pandas and Python packages, can inspect CSV data, and is tasked with returning a CSV in a specific format with a provided unit test/verifier [00:07:03]. While a single run might not work, running it 50 times in parallel drastically increases success rates and generalization across diverse formats [00:07:28].
Though this version uses potentially 10,000 times more compute than the first approach, the cost (less than a dollar per transaction) is negligible compared to the value of successful transactions and the scarcity of engineer time [00:07:42]. This highlights the strategic advantage of leveraging scalable compute even at higher raw cost, as it frees up valuable human resources.
Generalizing Software Architectures
These three approaches illustrate a general pattern for software architecture:
- Approach 1 (Classical): No AI; code is entirely classical and deterministic [00:08:39]. This reflects how almost all systems were built historically [00:09:37].
- Approach 2 (Classical calls AI): Regular programming languages call into AI services (e.g., OpenAI) for “fuzzy” compute [00:09:44]. This is common today as companies integrate AI [00:09:48].
- Approach 3 (AI calls Classical): The LLM is primary, deciding when to utilize classical tools (like writing Python/Pandas code) to interact with classical environments [00:08:54]. Much of the compute is “fuzzy” [00:09:08].
Ramp is increasingly adopting the third approach because it leverages the constant improvement of LLMs by major labs [00:09:58]. This means a codebase utilizing more “fuzzy” compute automatically benefits from external advancements in AI without additional internal effort, aligning with the power of exponential trends [00:10:05].
Future of Web Application Architecture
A radical proposition for future web application architecture is to have the LLM be the backend, rather than merely a code generation tool [00:11:38].
Traditional Web App Model
In a traditional web app (e.g., Gmail), a static file server sends HTML, CSS, and JavaScript to the browser [00:10:47]. The frontend then makes requests to a backend, which hits a database to retrieve data [00:11:07]. Any LLM use in this model is typically limited to code generation during development [00:11:20].
AI-as-Backend Model
The proposed model shifts the backend entirely to an LLM [00:11:40]. This LLM has direct access to tools like code interpreters, can make network requests, and interact with databases [00:11:46].
Demo: LLM-powered Gmail Client A proof-of-concept email client demonstrates this architecture [00:12:00].
- Upon logging in, the Gmail token is sent to an LLM [00:14:03].
- The LLM is prompted to simulate a Gmail client, having access to emails, the user’s token, and a code interpreter [00:14:07].
- It renders the UI (in this case, Markdown) for the Gmail homepage, displaying emails [00:14:31].
- When a user clicks on an email, the LLM is informed of the click and the email ID [00:14:45].
- The LLM then decides how to render the page, potentially making a GET request to fetch the email body [00:14:56].
- The LLM also determines other appropriate UI features, such as “mark as unread” or “delete email” options [00:15:17].
While currently slow and experimental, this type of software, which barely works today, may become viable in the future due to exponential trends in AI capabilities [00:15:34]. The speaker encourages exploring this direction for future software architecture [00:16:04].