From: aidotengineer

The future of software architecture and AI engineering is poised for a significant shift, driven by the exponential growth in artificial intelligence capabilities. A core idea influencing this shift is that systems designed to scale with computational power will inherently outperform those that are rigid and fixed in their design [02:30:00].

Speaker’s Background

The speaker, head of AI at Ramp, has been working with Large Language Models (LLMs) for four years [00:00:20]. His experience dates back to before the widespread awareness spurred by ChatGPT, when he was building what would now be called an AI agent company for customer support [00:00:36]. Early models like GPT-2 and GPT-3 were often “frustratingly stupid,” with small context windows and poor reasoning abilities, requiring extensive custom code to function somewhat reliably [00:00:51]. As models became smarter, much of this scaffolding code could be deleted, revealing patterns for building agents that scale with increasing intelligence [01:04:00]. The speaker also built Jsonformer, an early structured extraction library, which served as scaffolding for models that were too “stupid” to produce valid JSON outputs [01:40:00].

The Bit or Lesson: Scaling with Compute

The fundamental principle guiding this perspective is that “systems that scale with compute beat systems that don’t” [02:30:00]. This is because exponential trends, like the growth in AI model intelligence, are rare and offer a “free pass” for rapid improvement [03:01:00].

Historically, in fields like chess, Go, computer vision, and Atari games, attempts to build rigid, deterministic systems by synthesizing human reasoning through extensive hand-written code eventually yielded to general methods that scaled with increased search or computational power [03:17:00]. While a fixed amount of compute might favor a rigid, optimized approach, scaling compute invariably leads to the general method winning [03:43:00].

AI at Ramp: The Switching Report Agent Example

Ramp is a finance platform that leverages AI to automate tasks like expense management, payments, and bookkeeping [04:00:00]. One specific example is the “switching report” agent, designed to parse arbitrary CSV files from third-party card providers to help users onboard transactions to Ramp [04:38:00].

Three approaches to architecting this agent illustrate the core idea:

  1. Deterministic, Manually Coded Approach: Manually writing code for the 50 most common third-party card vendors. This works but is rigid, requires significant manual effort, and breaks if formats change [05:32:00].
  2. Constrained Agent (Hybrid Approach): Introduces LLMs to classify CSV columns (e.g., date, transaction amount, merchant name) and map them to a desired schema [06:01:00]. Most of the compute remains “classical,” with some “fuzzy LLM land” integration [06:47:00]. This is more general but still limited.
  3. Fuzzy/LLM-Centric Approach: The most radical approach involves giving the LLM direct access to a code interpreter (e.g., Pandas or Rust-based ones) and the ability to view parts of the CSV [06:59:00]. The LLM is tasked with producing a CSV in a specific format and is provided with a unit test/verifier [07:17:00]. While a single run might not work, running it multiple times in parallel (e.g., 50 times) significantly increases its reliability and generalization across various formats [07:31:00]. This approach uses vastly more compute (e.g., 10,000 times more) but is still cost-effective (less than a dollar per transaction) compared to the cost of engineer time or failed CSVs [07:43:00].

Generalizing Software Architectures

These three approaches can be generalized to broader software architectures:

  • Approach 1 (Classical): Purely classical compute, where code is hand-written and deterministic [08:39:00].
  • Approach 2 (Hybrid/Fuzzy-Called-By-Classical): Classical programming languages call into “fuzzy” LLM services (e.g., OpenAI APIs) for specific tasks like similarity scoring [09:48:00]. This is common today.
  • Approach 3 (Fuzzy-Calls-Classical): The LLM itself decides when to “break into classical land” by writing and executing code (e.g., Python/Pandas) [08:54:00]. Most of the compute occurs in the “fuzzy” LLM space [09:06:00]. Ramp’s codebase is increasingly moving towards this architecture [09:58:00].

The advantage of Approach 3 is that the “blue arrows” (representing LLM capabilities) improve constantly due to the billions of dollars invested by large labs, effectively enhancing a company’s product without internal effort [10:05:00]. This allows companies to “hitch a ride” on the exponential advancements in AI [10:24:24].

The Future: LLM as the Backend

The speaker proposes a radical shift where the LLM itself acts as the backend for web applications [11:40:00].

  • Traditional Web App: Frontend (JavaScript, HTML, CSS) makes requests to a classical backend, which interacts with a database [10:47:00]. LLMs might be used for code generation during development, but not at runtime [11:27:00].
  • LLM-as-Backend Model: The LLM is the execution engine, having access to tools like code interpreters, network request capabilities, and database access [11:46:00]. The frontend communicates directly with the LLM, which then renders the UI and handles logic [14:11:00].

The speaker demonstrated a mail client built on this principle. When a user logs in, their Gmail token is sent to an LLM [14:03:00]. The LLM is prompted to simulate a Gmail client, given access to the user’s emails and a code interpreter, and then renders the homepage UI in markdown [14:11:00]. When a user clicks on an email, the LLM is informed of the action and re-renders the page, making further API calls (e.g., to fetch email content) as needed [14:45:00].

While this kind of software is currently slow and “barely works” [15:34:00], the speaker suggests that exponential trends in AI advancements could lead to its rapid proliferation in the future of AI technology [15:56:00]. This vision implies a future where multi-agentic systems in AI handle not just specific tasks but entire application backends, significantly impacting user experience and integrations through enhanced efficiency and smart execution.