Large language models for data analysis

From: aidotengineer

Analyzing vast amounts of unstructured data, such as thousands of sales calls, can be an impossible task for human teams due to the sheer volume and time constraints [00:01:36]. This challenge highlights a significant bottleneck in gleaning insights from critical business interactions [00:02:25]. Historically, approaches were either high-quality but unscalable manual analysis, or fast but context-lacking keyword analysis [00:02:34]. However, the advent of modern large language models (LLMs) has transformed this landscape, making complex data analysis achievable for a single AI engineer in a matter of weeks [00:02:50].

The Challenge of Unstructured Data

At Pulley, a key objective was to analyze 10,000 sales calls within two weeks to refine their ideal customer profile (ICP), moving beyond a broad “venture-backed startups” to more specific segments like “CTO of an early-stage venture-backed crypto startup” [00:00:35] [00:01:02]. Manual analysis of such a dataset would involve:

Downloading and reading each transcript [00:01:47].
Classifying conversations based on target personas [00:01:53].
Scanning for key insights and compiling notes with citations [00:01:58].

To analyze 10,000 30-minute calls manually, it would take approximately 625 days of continuous work, equivalent to nearly two years [00:02:14] [00:02:19]. This highlights the human brain’s inability to process such vast amounts of information efficiently [00:02:22].

Leveraging Large Language Models

The intersection of unstructured data and pattern recognition is a “sweet spot” for AI projects [00:02:55]. While seemingly simple, using AI for sales call analysis required addressing several technical challenges [00:03:02].

Model Selection and Performance

For this project, experiments with Large Language Models and AI Assisted Work revealed that model choice was crucial [00:03:12]. Although smaller, cheaper models were tempting, they quickly showed limitations, producing an alarming number of false positives [00:03:26]. For example, they might mistakenly classify a company as crypto-related due to a fleeting mention of blockchain features or incorrectly identify a prospect as a founder without supporting evidence [00:03:37].

The project prioritized accuracy, recognizing that a bad analysis would render the entire effort pointless [00:03:54]. Therefore, the decision was made to use more expensive yet intelligent options like GPT-4o and Claude 3.5 Sonnet, which offered an acceptable hallucination rate [00:03:14] [00:04:03]. Claude 3.5 Sonnet was ultimately chosen [00:04:10].

Reducing Hallucinations and Ensuring Reliability

To combat hallucinations and enhance reliability, a multi-layered approach was developed [00:04:20]:

Data Enrichment: Raw transcript data was enriched using Retrieval Augmented Generation (RAG) from both third-party and internal sources [00:04:27].
Prompt Engineering: Techniques like chain-of-thought prompting were employed to guide the model towards more reliable results [00:04:38].
Structured Outputs: Generating structured JSON outputs allowed for automatic citation generation, creating a verifiable trail back to the original transcripts [00:04:46].

This robust system reliably extracted accurate company details and meaningful insights with high confidence in the final results [00:04:55].

Cost Optimization

Initial high costs due to extensive analysis and low error rate requirements, often hitting the 4,000 token output limit for Claude 3.5 Sonnet, necessitated multiple requests per transcript [00:05:10]. Two experimental features significantly reduced these costs:

Prompt Caching: Reusing the same transcript content for repeated analysis (metadata and insights extraction) reduced costs by up to 90% and latency by up to 85% [00:05:33].
Extended Outputs: An experimental feature flag provided double the original output context, allowing for complete summaries in single passes instead of multiple turns [00:05:53].

These optimizations transformed a $5, 000 ana l ys i s in t o a$ 500 one, delivering results in days instead of weeks [00:06:14].

Impact and Future Possibilities

What began as a project for the executive team yielded wider organizational benefits [00:06:30]:

Marketing Team: Able to quickly identify customers for branding and positioning exercises [00:06:47].
Sales Team: Automated transcript downloads, saving dozens of hours weekly [00:06:54].
New Questions: Teams began asking questions previously considered too daunting for manual analysis [00:07:03].

The project transformed unstructured data from a liability into an asset [00:07:13].

Key Takeaways

Models Matter [00:07:22]

Despite the push for open-source and cheaper models, high-performing models like Claude 3.5 and GPT-4o could handle tasks others could not [00:07:25]. The right tool is the one that best fits specific needs [00:07:41].

Good Engineering Still Matters [00:07:48]

Significant wins came from traditional software engineering principles: leveraging JSON structured output, good database schemas, and proper system architecture [00:07:51]. AI engineering involves building effective systems around LLMs, ensuring they are thoughtfully integrated, not merely bolted on [00:08:03].

Consider Additional Use Cases [00:08:21]

Building a flexible tool with features like search filters and exports transformed a one-off report into a company-wide resource [00:08:26].

This project demonstrated how AI can transform seemingly impossible tasks into routine operations [00:08:42]. LLMs augment human analysis and remove bottlenecks, not just doing things faster, but unlocking entirely new possibilities [00:08:50].

Valuable sources of insight such as sales calls, support tickets, product reviews, user feedback, and social media interactions often go untouched [00:09:09]. However, large language models now make these data accessible, enabling companies to turn them into valuable assets [00:09:24].

Tubegraph

Explorer

Table of Contents

Large language models for data analysis

The Challenge of Unstructured Data

Leveraging Large Language Models

Model Selection and Performance

Reducing Hallucinations and Ensuring Reliability

Cost Optimization

Impact and Future Possibilities

Key Takeaways

Graph View

Backlinks