From: aidotengineer
Method specializes in collecting and centralizing liability data from a wide range of sources, including credit bureaus, card networks like Visa and MasterCard, direct connections with financial institutions, and other third-party providers [00:00:33]. This raw data is then aggregated and enhanced before being served to customers [00:00:50].
Customer Use Cases
Method’s customers are typically other fintechs, banks, or lenders [00:00:54]. They utilize this enhanced data for various debt management purposes, such as refinancing, loan consolidation, liability payments, and personal finance management [00:00:58].
Challenges in Data Acquisition
An early challenge Method faced was fulfilling customer requests for liability-specific data points, such as the payoff amount on an auto loan or the escrow balance for a mortgage [00:01:50]. Research showed there was no central API to access these specific data points [00:02:07]. While direct integration with banks was an option, it was estimated to take at least two years to achieve tangible results, which was not viable for an early-stage company aiming for rapid customer solutions [00:02:14].
Traditional Inefficient Methods
Prior to Method’s solution, many companies relied on highly inefficient, manual processes to obtain this financial data [00:03:23]:
- They would hire offshore teams of contractors [00:02:55].
- These teams would call banks on behalf of the company and the end consumer [00:03:02].
- They would authenticate with the banks and gather the necessary information [00:03:06].
- The collected data then required a proof-checking stage before integration into financial platforms for uses like underwriting [00:03:10].
This manual approach had several significant drawbacks:
- Costly: It requires hiring more people to scale, as one person can only perform one task at a time [00:03:33].
- Slow: The synchronous nature of the process makes it very slow [00:03:41].
- Prone to Human Error: High risk of human error, necessitating additional teams for fact and proof-checking, which could lead to inaccurate financial information being surfaced [00:03:48].
Leveraging AI for Data Enhancement
Method identified that the core problem was making sense of unstructured data, akin to an API with request, authentication, and response validation components [00:04:13]. The emergence of advanced LLMs, particularly after GPT-4’s announcement, presented a potential solution, given their proficiency in parsing unstructured data and tasks like summarization or classification [00:04:31].
Method developed an agentic workflow using GPT-4 for data extraction [00:05:16]. Initial tests showed promising results, allowing for expansion of use cases and more information extraction from a single API call [00:05:22].
Challenges with Initial LLM Adoption
While effective, using GPT-4 at scale quickly revealed significant issues:
- High Cost: The first month in production with GPT-4 incurred a cost of $70,000 [00:05:53]. Although the value was immense, this cost was a concern [00:06:07].
- Prompt Engineering Limitations: Scaling use cases quickly ran into walls with prompt engineering [00:06:25]. GPT-4, despite its intelligence, lacked financial expertise, requiring detailed, lengthy, and convoluted instructions and examples for various use cases [00:06:31]. Prompts were hard to generalize, leading to a “cat and mouse chase” of fixes that broke other scenarios, and there was no prompt versioning [00:06:44].
- Performance Issues:
- Caching Limitations: Variability in responses and frequent prompt tweaks made optimization for caching difficult [00:07:18].
- Slow Latency: The baseline latency was slow, hindering concurrent scaling [00:07:25].
- AI Errors: AI hallucinations, while different from human errors, were hard to catch [00:07:36].
These issues prevented scaling the system reliably, despite its effectiveness for specific use cases [00:07:41].
Scaling with Open-Source Models and Fine-Tuning
The problem shifted from merely parsing unstructured data to building a robust, agentic workflow capable of handling high volume reliably, with targets of 16 million requests per day, 100K concurrent load, and sub-200 millisecond latency [00:07:57].
OpenPipe partnered with Method to address these common design considerations for financial AI tools around quality, cost, and latency [00:08:35].
Benchmarking and Goals
Initial benchmarks showed:
- Error Rate: GPT-4 had an 11% error rate, while O3 mini had a 4% error rate [00:09:24]. Method’s agentic workflow allowed for relatively easy measurement by comparing agent outputs to human-verified data [00:09:41].
- Latency: GPT-4 was around 1 second, while O3 mini took about 5 seconds for the specific task [00:10:05].
- Cost: While O3 mini had a lower per-token cost, it generated many more “reasoning tokens,” making it slightly more expensive for Method’s use case [00:10:41].
Method’s target requirements, considering their follow-up plausibility checks, were:
- Error Rate: Around 9% [00:11:58].
- Latency: A hard latency cutoff due to the real-time nature of their agent system [00:12:04].
- Cost: Very important due to high volume [00:12:40].
Neither GPT-4 nor O3 mini could meet all three requirements simultaneously [00:13:05].
Fine-Tuning as a Solution
Fine-tuning was identified as a powerful tool to bridge the gap, despite requiring more engineering investment than prompt engineering [00:13:46]. OpenPipe worked on building custom models tailored to Method’s specific use case [00:13:39].
Through fine-tuning a smaller, 8-billion parameter LL3.1 model, they achieved significant improvements:
- Improved Accuracy (Lower Error Rate): The fine-tuned model performed significantly better than GPT-4 and surpassed the required error rate threshold [00:14:28]. This was made easier by using existing production data from GPT-4 to generate outputs for training [00:14:40].
- Reduced Latency: Moving to a much smaller model greatly reduced latency due to fewer calculations. It also opened the possibility of deploying the model within Method’s own infrastructure to eliminate network latency [00:15:34].
- Lower Cost: The smaller model resulted in a significantly lower cost, exceeding the cost thresholds Method had to make their operations viable [00:16:02]. This removed unit economics concerns associated with larger models [00:16:25].
This approach demonstrates how fine-tuning can “bend the price-performance curve” [00:16:00], enabling efficiency improvements with AI in financial analysis and allowing AI tools for business efficiency to scale to a very large production level with only a small engineering team [00:17:19]. Productionizing AI agents requires openness and patience, as unlike traditional code, AI agents evolve and require time to reach production readiness and consistent responses [00:17:47].