From: aidotengineer
This article explores the approaches to AI implementation and highlights several practical examples of how Anthropic’s large language models are being used to solve real business problems and enhance customer experiences [00:00:44]. The insights shared are based on hundreds of customer interactions [00:01:14].
About Anthropic
Anthropic is an AI safety and research company dedicated to building safe and effective large language models (LLMs) [00:01:26]. Founded by leading experts in AI a few years ago [00:01:34], they have released multiple iterations of their Frontier models, emphasizing safety techniques, research, and policy [00:01:43].
Their most recent model, Sonnet 3.5 new, launched in late October of the previous year [00:01:55]. Sonnet is a leading model in the code space, performing at the top of leaderboards for agentic coding evaluations like Sbench [00:02:04].
Interpretability Research
Anthropic focuses on various research directions, including model capabilities, product research, and AI safety [00:02:25]. A distinguishing area of their research is interpretability, which involves reverse-engineering models to understand how and why they “think” [00:02:36]. This research is still in its early stages [00:02:53] and approaches interpretability in building stages:
- Understanding Grasping AI decision-making [00:03:07].
- Detection Understanding specific behaviors and labeling them [00:03:10].
- Steering Influencing the AI’s input [00:03:15].
- Explainability Unlocking business value from interpretability methods [00:03:22].
Interpretability aims to significantly improve AI safety, reliability, and usability [00:03:31]. For example, understanding feature activations at the model level can help identify recognized patterns, such as a group of neurons activating when famous NBA players are mentioned [00:04:24].
An example of model steering is “Golden Gate Claude,” where the model’s activation in the “Golden Gate” direction was amplified. This resulted in Claude suggesting painting a bedroom “red like the Golden Gate Bridge” when asked about bedroom paint colors [00:04:41].
AI Applications and Customer Engagement
Anthropic encourages customers to focus on using AI to solve core product problems, moving beyond basic applications like chatbots and summarization [00:05:22]. They work closely with AI-native companies and startups [00:05:29].
Examples of AI Applications:
- Onboarding and Upskilling Platforms: Instead of just summarizing course content or providing Q&A chatbots, AI can hyper-personalize course content based on individual employee context [00:06:17]. It can dynamically adapt content to be more challenging for fast learners [00:06:26] or update material based on individual learning styles (e.g., creating visual content for visual learners) [00:06:33].
- Diverse Industries: Anthropic’s models are impacting various industries, including taxes, legal, and project management [00:07:14]. These companies are using AI to drastically enhance their customer experience, making products easier to use and more trustworthy [00:07:22]. This includes achieving high-quality outputs where hallucination is unacceptable, such as in tax preparation [00:07:35].
Customer Success Story: Intercom
Intercom, an AI customer service platform, has an AI agent called Finn, which is considered a market leader [00:10:58]. Anthropic partnered with Intercom to enhance Finn’s capabilities [00:11:19].
The collaboration began with a two-week sprint where Anthropic’s applied AI team worked with Intercom’s data science team [00:11:27]. They compared Intercom’s hardest prompt for Finn against a prompt developed with Claude [00:11:32]. Following positive initial results, a two-month optimization sprint was undertaken to fine-tune and optimize all of Intercom’s prompts for best performance with Claude [00:11:43].
This effort led to Anthropic’s model outperforming the previous LLM in Intercom’s benchmarks [00:11:59]. Intercom then launched “Finn 2,” which demonstrated impressive metrics:
- Can solve up to 86% of customer support volume [00:12:22].
- 51% resolution rate “out of the box” [00:12:27].
- Anthropic’s own support team adopted Finn and saw similar resolution rates [00:12:30].
- Enhanced “human element” features, including tone adjustment and answer length [00:12:37].
- Improved policy awareness, such as refund policies [00:12:45].
Intercom uses a resolution-based pricing model, creating an incentive for the model to be truly helpful in solving customer problems rather than just deflecting them [00:12:02].
Anthropic’s Product Offerings and Support
Anthropic offers its models via their API for businesses that want to embed AI in their products and services, and Claude for Work to empower entire organizations to use AI in day-to-day tasks [00:08:06]. They also partner with AWS (Bedrock) and GCP (Vertex), allowing customers to access Frontier models and deploy applications within their existing environments without managing new infrastructure [00:08:22].
The applied AI team provides technical support, helping design architectures, evaluations, and tweak Claude prompts [00:09:05]. They support customers facing niche challenges in specific use case domains, applying the latest research and maximizing model output [00:10:04]. Their process involves a Sprint approach to define metrics, deploy iterative loops, and transition to AB test environments and production [00:10:17].
Best Practices and Common Mistakes in AI Implementation
Based on their extensive customer interactions, Anthropic has identified common pitfalls and best practices for AI implementation:
1. Testing and Evaluation
- Common Mistake: Building a robust workflow and then trying to build evaluations [00:13:28]. Evals should direct you toward the desired outcome [00:13:38]. Other mistakes include struggling with data for eval design, or “trusting the vibes” by only running a few queries without sufficient, representative samples [00:13:50].
- Best Practice: Understand that evaluations are crucial for empirically navigating the “latent space” of a use case to find an optimized point [00:14:26]. Evals are your “intellectual property” that enable competitive advantage [00:15:14].
- Set up telemetry for back-testing architecture [00:15:35].
- Design representative test cases, including “silly examples” that might occur in real-world use (e.g., asking a customer support agent how to kill a zombie in Minecraft) to ensure appropriate model response or rerouting [00:15:43].
2. Identifying Metrics
- Common Mistake: Not clearly defining the balance between intelligence, cost, and latency for a specific use case [00:16:16]. Most organizations can optimize for one or two of these, but rarely all three simultaneously [00:16:26].
- Best Practice: The stakes and time sensitivity of the decision should drive optimization choices [00:17:10]. For example:
- A customer support use case might prioritize a response within 10 seconds due to customer behavior [00:16:40].
- A financial research analyst agent can take 10 minutes to respond because the subsequent capital allocation decision is highly important [00:16:55].
- Consider UX approaches to manage latency, such as “thinking boxes” or redirecting users to other pages while the model processes [00:17:21].
3. Fine-tuning
- Common Mistake: Viewing fine-tuning as a “silver bullet” without understanding its costs or limitations [00:17:58]. Fine-tuning can limit a model’s reasoning in fields outside the specific fine-tuned domain [00:18:06]. Many attempt fine-tuning without a clear eval set or success criteria [00:18:19].
- Best Practice: Try other approaches first [00:18:16]. Fine-tuning should only be considered if intelligence requirements cannot be met otherwise [00:18:26]. Justify the cost and effort of fine-tuning, and avoid letting it slow down product development iteration [00:18:41].
Other Methods for AI Performance Improvement
Beyond basic prompt engineering, various other methods and architectural decisions can drastically improve the success of an AI use case [00:19:32]:
- Prompt Caching: Can significantly reduce cost and increase speed without sacrificing intelligence [00:19:47].
- Contextual Retrieval: Improves the effectiveness of retrieval mechanisms, feeding information to the model more efficiently [00:19:54].
- Citations: An out-of-the-box feature that can be applied [00:20:09].
- Agentic Architectures: A key architectural decision for advanced AI applications [00:20:11].