From: aidotengineer

Method specializes in collecting and centralizing liability data from a wide range of sources, including credit bureaus, card networks like Visa and MasterCard, direct connections with financial institutions, and other third-party providers [00:00:33]. This raw data is then aggregated and enhanced before being served to customers [00:00:50].

Customer Use Cases

Method’s customers are typically other fintechs, banks, or lenders [00:00:54]. They utilize this enhanced data for various debt management purposes, such as refinancing, loan consolidation, liability payments, and personal finance management [00:00:58].

Challenges in Data Acquisition

An early challenge Method faced was fulfilling customer requests for liability-specific data points, such as the payoff amount on an auto loan or the escrow balance for a mortgage [00:01:50]. Research showed there was no central API to access these specific data points [00:02:07]. While direct integration with banks was an option, it was estimated to take at least two years to achieve tangible results, which was not viable for an early-stage company aiming for rapid customer solutions [00:02:14].

Traditional Inefficient Methods

Prior to Method’s solution, many companies relied on highly inefficient, manual processes to obtain this financial data [00:03:23]:

  • They would hire offshore teams of contractors [00:02:55].
  • These teams would call banks on behalf of the company and the end consumer [00:03:02].
  • They would authenticate with the banks and gather the necessary information [00:03:06].
  • The collected data then required a proof-checking stage before integration into financial platforms for uses like underwriting [00:03:10].

This manual approach had several significant drawbacks:

  • Costly: It requires hiring more people to scale, as one person can only perform one task at a time [00:03:33].
  • Slow: The synchronous nature of the process makes it very slow [00:03:41].
  • Prone to Human Error: High risk of human error, necessitating additional teams for fact and proof-checking, which could lead to inaccurate financial information being surfaced [00:03:48].

Leveraging AI for Data Enhancement

Method identified that the core problem was making sense of unstructured data, akin to an API with request, authentication, and response validation components [00:04:13]. The emergence of advanced LLMs, particularly after GPT-4’s announcement, presented a potential solution, given their proficiency in parsing unstructured data and tasks like summarization or classification [00:04:31].

Method developed an agentic workflow using GPT-4 for data extraction [00:05:16]. Initial tests showed promising results, allowing for expansion of use cases and more information extraction from a single API call [00:05:22].

Challenges with Initial LLM Adoption

While effective, using GPT-4 at scale quickly revealed significant issues:

  • High Cost: The first month in production with GPT-4 incurred a cost of $70,000 [00:05:53]. Although the value was immense, this cost was a concern [00:06:07].
  • Prompt Engineering Limitations: Scaling use cases quickly ran into walls with prompt engineering [00:06:25]. GPT-4, despite its intelligence, lacked financial expertise, requiring detailed, lengthy, and convoluted instructions and examples for various use cases [00:06:31]. Prompts were hard to generalize, leading to a “cat and mouse chase” of fixes that broke other scenarios, and there was no prompt versioning [00:06:44].
  • Performance Issues:
    • Caching Limitations: Variability in responses and frequent prompt tweaks made optimization for caching difficult [00:07:18].
    • Slow Latency: The baseline latency was slow, hindering concurrent scaling [00:07:25].
    • AI Errors: AI hallucinations, while different from human errors, were hard to catch [00:07:36].

These issues prevented scaling the system reliably, despite its effectiveness for specific use cases [00:07:41].

Scaling with Open-Source Models and Fine-Tuning

The problem shifted from merely parsing unstructured data to building a robust, agentic workflow capable of handling high volume reliably, with targets of 16 million requests per day, 100K concurrent load, and sub-200 millisecond latency [00:07:57].

OpenPipe partnered with Method to address these common design considerations for financial AI tools around quality, cost, and latency [00:08:35].

Benchmarking and Goals

Initial benchmarks showed:

  • Error Rate: GPT-4 had an 11% error rate, while O3 mini had a 4% error rate [00:09:24]. Method’s agentic workflow allowed for relatively easy measurement by comparing agent outputs to human-verified data [00:09:41].
  • Latency: GPT-4 was around 1 second, while O3 mini took about 5 seconds for the specific task [00:10:05].
  • Cost: While O3 mini had a lower per-token cost, it generated many more “reasoning tokens,” making it slightly more expensive for Method’s use case [00:10:41].

Method’s target requirements, considering their follow-up plausibility checks, were:

  • Error Rate: Around 9% [00:11:58].
  • Latency: A hard latency cutoff due to the real-time nature of their agent system [00:12:04].
  • Cost: Very important due to high volume [00:12:40].

Neither GPT-4 nor O3 mini could meet all three requirements simultaneously [00:13:05].

Fine-Tuning as a Solution

Fine-tuning was identified as a powerful tool to bridge the gap, despite requiring more engineering investment than prompt engineering [00:13:46]. OpenPipe worked on building custom models tailored to Method’s specific use case [00:13:39].

Through fine-tuning a smaller, 8-billion parameter LL3.1 model, they achieved significant improvements:

  • Improved Accuracy (Lower Error Rate): The fine-tuned model performed significantly better than GPT-4 and surpassed the required error rate threshold [00:14:28]. This was made easier by using existing production data from GPT-4 to generate outputs for training [00:14:40].
  • Reduced Latency: Moving to a much smaller model greatly reduced latency due to fewer calculations. It also opened the possibility of deploying the model within Method’s own infrastructure to eliminate network latency [00:15:34].
  • Lower Cost: The smaller model resulted in a significantly lower cost, exceeding the cost thresholds Method had to make their operations viable [00:16:02]. This removed unit economics concerns associated with larger models [00:16:25].

This approach demonstrates how fine-tuning can “bend the price-performance curve” [00:16:00], enabling efficiency improvements with AI in financial analysis and allowing AI tools for business efficiency to scale to a very large production level with only a small engineering team [00:17:19]. Productionizing AI agents requires openness and patience, as unlike traditional code, AI agents evolve and require time to reach production readiness and consistent responses [00:17:47].