From: aidotengineer

Method specializes in collecting and centralizing liability data from numerous sources, including credit bureaus, card networks (Visa, MasterCard), direct financial institution connections, and other third-party providers [00:00:31]. This aggregated and enhanced data is then served to fintechs, banks, and lenders, who utilize it for debt management, refinancing, loan consolidation, liability payments, and personal finance management [00:00:50].

The Challenge of Granular Financial Data

An early challenge for Method was the demand from customers for more specific liability data points, such as the payoff amount for an auto loan or the escrow balance for a mortgage [00:01:46]. Research revealed no central API existed to access these particular data points [00:02:05]. Directly integrating with banks was not feasible for an early-stage company, as it could take years to establish a solid connection [00:02:14].

Traditional, Manual Solutions and Their Drawbacks

Existing companies often address this by hiring offshore teams of contractors [00:02:55]. These teams call banks, authenticate, gather information, proof-check it, and then integrate it into financial platforms for uses like underwriting [00:03:00].

However, this manual process presents significant challenges [00:03:21]:

  • Inefficiency and Lack of Scalability: One person can only perform one task at a time, requiring more hires for scaling [00:03:33].
  • Slowness: The synchronous nature of calls makes the process very slow [00:03:41].
  • High Cost: The need for more personnel increases operational expenses [00:03:33].
  • Human Error: A high potential for human error necessitates additional teams for fact-checking and proof-checking, risking inaccurate financial information being surfaced [00:03:48].

Conceptually, this manual process resembles an API, involving a request, authentication, and response validation [00:04:04]. The core problem boils down to making sense of unstructured data [00:04:13].

Embracing AI for Unstructured Data

Method sought a tool adept at parsing unstructured data [00:04:19]. The emergence of GPT-4 and the “Cambrian explosion” of LLM-enabled applications presented a perfect opportunity, as advanced LLMs are highly effective at tasks like summarization and classification of unstructured data [00:04:31].

Method developed an agentic workflow using GPT-4, which initially performed exceptionally well, even for expanding use cases [00:05:13].

Initial AI Implementation: New Challenges

Despite initial success, scaling the GPT-4 solution introduced new challenges [00:05:45]:

  • Exorbitant Cost: The first month in production with GPT-4 incurred a cost of $70,000 [00:05:50]. While leadership recognized the immense value, this cost was unsustainable long-term [00:06:07].
  • Prompt Engineering Limitations: As use cases scaled, prompt engineering proved insufficient. GPT-4, while smart, lacked financial expertise, requiring excessively detailed and convoluted instructions and examples for various scenarios [00:06:23]. This led to a “cat and mouse chase” where fixes for one scenario broke others, compounded by a lack of prompt versioning [00:06:44].
  • Scaling Inefficiencies:
    • Cost: Caching was ineffective due to variability in responses and constant prompt tweaks [00:07:18].
    • Latency: Baseline latency was slow, hindering concurrent scaling [00:07:24].
    • AI Errors: Unlike human errors, AI hallucinations were difficult to catch [00:07:32].

Although GPT-4 solved the core problem of parsing unstructured data for specific use cases, it presented significant scaling challenges that prevented robust, high-volume deployment [00:07:41].

Scaling AI for Financial Data

The problem shifted from merely understanding unstructured data to building a robust agentic workflow that could reliably handle high volumes [00:07:57]. Method’s target figures for a scalable system included:

OpenPipe and Fine-tuning as the Solution

OpenPipe collaborated with Method to address the issues of quality, cost, and latency, which are common concerns for companies using AI [00:08:34].

Benchmarking Existing Models

OpenPipe measured the performance of GPT-4 and O3 mini against Method’s needs [00:09:04]:

  • Error Rates: GPT-4 had an 11% error rate, while O3 mini had a 4% error rate [00:09:24]. Error rates were measured by comparing agent outputs to human-verified correct data [00:09:51].
  • Latency: GPT-4 responded in about 1 second, and O3 mini took about 5 seconds for Method’s specific tasks [00:10:05].
  • Cost: Surprisingly, O3 mini was slightly more expensive than GPT-4 for Method’s use case, despite its lower per-token cost, because it generated many more “reasoning tokens” (longer outputs) [00:10:41].

Method’s target requirements were:

  • Error Rate: Around 9% error rate was acceptable due to downstream plausibility checks [00:11:58].
  • Latency: A hard latency cut-off was necessary for the real-time agent system [00:12:15].
  • Cost: High volume made cost a critical factor [00:12:38].

Neither GPT-4 nor O3 mini met all three requirements [00:13:03].

The Power of Fine-tuning

OpenPipe’s solution was to use fine-tuning to build custom models tailored to Method’s specific use case [00:13:37]. Fine-tuning is a “power tool” that requires more engineering investment than simple prompt engineering but can significantly “bend the price-performance curve” [00:13:47].

Fine-tuning leveraged Method’s existing production data from GPT-4 to train a custom model, such as an 8-billion parameter LL 3.1 model [00:14:45].

The results of fine-tuning were transformative [00:14:12]:

  • Quality (Error Rate): The fine-tuned model achieved significantly better error rates than GPT-4 and surpassed Method’s target threshold [00:14:24].
  • Latency: Moving to a much smaller (8-billion parameter) model drastically reduced latency, enabling deployment in a low-latency manner with fewer calculations. This also opens possibilities for co-locating the model with application code to eliminate network latency [00:15:30].
  • Cost: The smaller, fine-tuned model resulted in a much lower cost, far exceeding Method’s cost thresholds and making the business viable from a unit economics perspective [00:16:02].

Key Takeaways for AI Integration in Financial Technology

  • Simplicity and Cost-Effectiveness: It’s possible to use cheaper models and fine-tune them using existing production data, without needing to invest in GPUs [00:17:21].
  • Patience and Openness: Productionizing AI agents requires openness and patience from engineering and leadership teams. Unlike traditional deterministic code, AI agents take time to become production-ready and deliver consistent results due to their probabilistic nature [00:17:46].
  • Fine-tuning as a Strategic Tool: When prompt engineering with off-the-shelf models doesn’t meet reliability targets, fine-tuning is a viable strategy to significantly improve the price-performance curve and achieve large-scale production deployment [00:16:48].