From: aidotengineer
Analyzing vast amounts of unstructured data, such as thousands of sales calls, can be an impossible task for human teams due to the sheer volume and time constraints [00:01:36]. This challenge highlights a significant bottleneck in gleaning insights from critical business interactions [00:02:25]. Historically, approaches were either high-quality but unscalable manual analysis, or fast but context-lacking keyword analysis [00:02:34]. However, the advent of modern large language models (LLMs) has transformed this landscape, making complex data analysis achievable for a single AI engineer in a matter of weeks [00:02:50].
The Challenge of Unstructured Data
At Pulley, a key objective was to analyze 10,000 sales calls within two weeks to refine their ideal customer profile (ICP), moving beyond a broad “venture-backed startups” to more specific segments like “CTO of an early-stage venture-backed crypto startup” [00:00:35] [00:01:02]. Manual analysis of such a dataset would involve:
- Downloading and reading each transcript [00:01:47].
- Classifying conversations based on target personas [00:01:53].
- Scanning for key insights and compiling notes with citations [00:01:58].
To analyze 10,000 30-minute calls manually, it would take approximately 625 days of continuous work, equivalent to nearly two years [00:02:14] [00:02:19]. This highlights the human brain’s inability to process such vast amounts of information efficiently [00:02:22].
Leveraging Large Language Models
The intersection of unstructured data and pattern recognition is a “sweet spot” for AI projects [00:02:55]. While seemingly simple, using AI for sales call analysis required addressing several technical challenges [00:03:02].
Model Selection and Performance
For this project, experiments with Large Language Models and AI Assisted Work revealed that model choice was crucial [00:03:12]. Although smaller, cheaper models were tempting, they quickly showed limitations, producing an alarming number of false positives [00:03:26]. For example, they might mistakenly classify a company as crypto-related due to a fleeting mention of blockchain features or incorrectly identify a prospect as a founder without supporting evidence [00:03:37].
The project prioritized accuracy, recognizing that a bad analysis would render the entire effort pointless [00:03:54]. Therefore, the decision was made to use more expensive yet intelligent options like GPT-4o and Claude 3.5 Sonnet, which offered an acceptable hallucination rate [00:03:14] [00:04:03]. Claude 3.5 Sonnet was ultimately chosen [00:04:10].
Reducing Hallucinations and Ensuring Reliability
To combat hallucinations and enhance reliability, a multi-layered approach was developed [00:04:20]:
- Data Enrichment: Raw transcript data was enriched using Retrieval Augmented Generation (RAG) from both third-party and internal sources [00:04:27].
- Prompt Engineering: Techniques like chain-of-thought prompting were employed to guide the model towards more reliable results [00:04:38].
- Structured Outputs: Generating structured JSON outputs allowed for automatic citation generation, creating a verifiable trail back to the original transcripts [00:04:46].
This robust system reliably extracted accurate company details and meaningful insights with high confidence in the final results [00:04:55].
Cost Optimization
Initial high costs due to extensive analysis and low error rate requirements, often hitting the 4,000 token output limit for Claude 3.5 Sonnet, necessitated multiple requests per transcript [00:05:10]. Two experimental features significantly reduced these costs:
- Prompt Caching: Reusing the same transcript content for repeated analysis (metadata and insights extraction) reduced costs by up to 90% and latency by up to 85% [00:05:33].
- Extended Outputs: An experimental feature flag provided double the original output context, allowing for complete summaries in single passes instead of multiple turns [00:05:53].
These optimizations transformed a 500 one, delivering results in days instead of weeks [00:06:14].
Impact and Future Possibilities
What began as a project for the executive team yielded wider organizational benefits [00:06:30]:
- Marketing Team: Able to quickly identify customers for branding and positioning exercises [00:06:47].
- Sales Team: Automated transcript downloads, saving dozens of hours weekly [00:06:54].
- New Questions: Teams began asking questions previously considered too daunting for manual analysis [00:07:03].
The project transformed unstructured data from a liability into an asset [00:07:13].
Key Takeaways
Models Matter [00:07:22]
Despite the push for open-source and cheaper models, high-performing models like Claude 3.5 and GPT-4o could handle tasks others could not [00:07:25]. The right tool is the one that best fits specific needs [00:07:41].
Good Engineering Still Matters [00:07:48]
Significant wins came from traditional software engineering principles: leveraging JSON structured output, good database schemas, and proper system architecture [00:07:51]. AI engineering involves building effective systems around LLMs, ensuring they are thoughtfully integrated, not merely bolted on [00:08:03].
Consider Additional Use Cases [00:08:21]
Building a flexible tool with features like search filters and exports transformed a one-off report into a company-wide resource [00:08:26].
This project demonstrated how AI can transform seemingly impossible tasks into routine operations [00:08:42]. LLMs augment human analysis and remove bottlenecks, not just doing things faster, but unlocking entirely new possibilities [00:08:50].
Valuable sources of insight such as sales calls, support tickets, product reviews, user feedback, and social media interactions often go untouched [00:09:09]. However, large language models now make these data accessible, enabling companies to turn them into valuable assets [00:09:24].