AI in healthcare and insurance

From: redpointai

Oscar Health, a $3 billion public health insurance company, has been at the forefront of innovating with technology in healthcare for the past decade [00:00:23]. As both an insurer and a provider of care, Oscar Health is actively experimenting with models like GPT-4 across various healthcare use cases [00:00:31]. Mario Scher, the CTO and co-founder of Oscar Health, offers insights into implementing these models for real use cases and the broader future of AI in healthcare [00:00:44].

Core Applications of AI in Healthcare

Large Language Models (LLMs) excel at translating between informal and formal language, which is highly relevant in healthcare [00:01:40]. Healthcare involves highly formalized language (e.g., ICD-10, CPT codes, utilization management guidelines) alongside a wealth of human language (e.g., patient-provider conversations, electronic medical record notes) [00:02:02]. While healthcare has seen less algorithmic coverage compared to other industries in the past 15 years, LLMs are now incredibly adept at bridging this gap [00:02:26].

Administrative Use Cases

Initially, many applications are found on the administrative side [00:03:11]. For instance, LLMs can translate complex claim processing traces, which are full of formal rules, into understandable informal language for laypersons [00:03:30]. The goal for the next five years is to make administrative processes real-time, bidirectional, and more transparent, allowing patients to understand costs and alternatives instantly [00:03:55].

Clinical Use Cases

The long-term vision includes using AI to replace caregivers and machine intelligence with clinical intelligence, ultimately reducing the cost of doctor visits by factors of 10 or 100, and potentially replacing specialists [00:04:30]. However, this runs into issues of hallucinations, safety, and biases in training data [00:04:50]. Oscar Health’s current focus is primarily on administrative use cases, with one clinical project underway, but with lower short-term expectations for breakthroughs [00:05:11].

Oscar Health’s AI Strategy and Use Cases

Oscar Health leverages AI across three main financial levers: growth and retention, operational efficiency (administrative), and medical cost reduction/outcome improvement (clinical) [00:10:24].

Growth and Retention:
- Personalized Outreach: AI is used to send personalized outbound campaigns to members, reminding them of positive experiences or preventative care actions taken (e.g., colon cancer screenings) [00:13:39].
- Persona-Based Messaging: Messaging is tailored based on member personas; chronically ill members respond better to empathy, while generally healthy members prefer convenience [00:14:41].
- Demographic Inference: LLMs can help fill missing demographic information (e.g., ethnicity from names or conversation language) to better match members with appropriate care or communication [00:16:07].
Administrative Efficiency:
- Call Summarization: LLMs are increasingly replacing manual note-taking by care guides during customer service calls [00:17:26].
- Lab Test Summarization: Used within Oscar’s medical group [00:17:41].
- Secure Messaging Medical Records Generation: Also within the medical group [00:17:46].
- Claims Explainers: Providing internal care guides with clear explanations for claim statuses [00:17:52]. These administrative automations yield savings measured in cents per member per month (PMPM) [00:18:01].
Clinical Applications:
- Medical Record Interaction: A key current use case is enabling doctors, customer service agents, or other Oscar personnel to “talk to” medical records [00:18:51].

Challenges of AI Implementation in Healthcare

Regulatory Hurdles

Healthcare is a highly regulated industry. HIPAA (Health Insurance Portability and Accountability Act) is the biggest constraint, requiring strict protection of patient-specific information [00:20:56]. This means AI vendors must sign Business Associate Agreements (BAAs) [00:21:05]. Oscar Health was reportedly the first organization to sign a BAA directly with OpenAI [00:21:18]. New LLM models from providers like Google (e.g., Gemini Ultra) do not automatically fall under existing HIPAA agreements, requiring a waiting period (typically 3-4 months) before real medical data can be used [00:22:30]. In the interim, synthetic or anonymized test data is used [00:22:51].

For companies looking to sell AI solutions to healthcare, strict security and policy reviews are standard [00:23:58]. Certifications like “High Trust” can ease the process, but ultimately, building trust with hospitals is paramount [00:24:23].

Data Cleanliness and Context

A significant challenge in clinical use cases is the lack of clean, complete inputs and outputs compared to administrative tasks [00:06:43]. Human-written summaries often contain subtle contextual knowledge (e.g., a provider remembering a previous unrecorded conversation with a patient) that LLMs cannot access [00:07:11]. To improve LLM performance in these “wide open” clinical scenarios, it’s crucial to improve the “horizon of knowledge” by feeding the LLM more context [00:08:02].

LLM Limitations and Prompting Strategies

Counting/Categorization: LLMs like GPT-4 struggle with tasks that require both processing input tokens and simultaneously counting or categorizing them within the same prompt, especially with large datasets [00:27:48]. This limitation, often called “running out of layers,” can be addressed by breaking down tasks into multiple steps or using a “Chain of Thought” approach, effectively chaining LLMs together to expand the processing capacity [00:29:52].
False Positives for Specific Concepts: In medical record extraction, LLMs can generate high false positives for specific concepts like “post-traumatic injury” [00:31:49]. This happens because the LLM’s general training data includes broader, layperson associations of the term that differ from its strict medical definition within utilization management [00:32:40]. A solution involves using “self-consistency questionnaires” where the LLM first generates multiple ways a concept might appear in records, then evaluates them independently before synthesizing a final answer [00:33:41].
Context Window Issues: Early in AI development, LLMs had limited context windows, preventing large claim traces (e.g., 1,000 lines of logic) from fitting in a single prompt [00:37:09]. Even with larger windows, LLMs struggled with the complexity of long decision trees [00:37:32]. The solution was to provide hierarchical prompts, giving a high-level trace and allowing the LLM to “double-click” on specific functions for more detail [00:38:25].

Trust and Enterprise Sales

Health systems and insurance companies are generally slow at rapid prototyping and follow-up [00:25:01]. In health tech, the “best products” do not always win; rather, the “best enterprise sales processes” often prevail [00:25:29]. Founders are advised to spend more time engaging with potential clients to build trust rather than solely focusing on model tweaking [00:25:39].

Future of AI in Healthcare

General Purpose vs. Specialized Models

Oscar Health has consistently found that specializing a general-purpose model in a particular area often leads to a loss of “alignment,” meaning the model struggles to follow instructions (e.g., outputting JSON) [00:43:40]. The preference is to use the largest general-purpose models for better reasoning capabilities, potentially combined with techniques like Retrieval-Augmented Generation (RAG) and fine-tuning [00:45:09]. Recent research suggests RAG and fine-tuning provide independent improvements in performance, making it beneficial to use both [00:45:16].

AI Doctor/Clinical Chatbots

While there’s no inherent reason why AI shouldn’t eventually take on the role of a “doctor” given the algorithmic nature of medical knowledge [00:56:59], several practical issues remain:

Safety: Direct LLM interaction with end-users in clinical settings is currently very difficult due to safety concerns [00:57:44]. A human-in-the-loop is currently required for sensitive use cases [00:50:58].
Physical Interaction: A significant portion of medical care (e.g., foot exams for diabetics, lab tests) requires in-person interaction, leading to “leakage” from virtual care systems [00:57:52].
Business Model: Health systems currently have little incentive to switch to lower-cost virtual care channels, as this could reduce reimbursement [00:59:50]. Insurers are better positioned to deploy automated virtual primary care but lack member engagement [01:00:15].

Overhyped/Underhyped in Healthcare AI

Overhyped: Clinical chatbots generally [01:00:49].
Underhyped: Voice outputs, especially for non-clinical applications [01:00:56].

Commercial Opportunities

Niche Regulatory Filings: Generating regulatory documentation and reports using LLMs could be a significant opportunity [00:53:30].
Fraud, Waste, and Abuse (FWA): This industry is still dominated by older players, presenting an opportunity for AI-driven disruption [01:00:52].
Prior Authorization: While many companies are focusing on AI for prior authorization, it’s a “neuralgic” function very close to an insurer’s core competency [00:55:00].

Structuring AI Teams at Oscar Health

Oscar Health has adopted a decentralized yet supported approach to AI development [00:46:20]:

Hackathons: Quarterly hackathons serve as a catalyst for experimentation and sharing, with the first AI-focused hackathon seeing the highest participation [00:46:37].
AI Pod: A dedicated seven-person “AI Pod” consisting of product managers, data scientists, and engineers [00:47:10]. This pod has:
- Weekly office hours for any employee to get help with prompts or AI-related questions [00:47:22].
- Three core projects that the pod itself needs to complete [00:47:34].
- Weekly “hacking sessions” to encourage open sharing of ideas and prototypes, whether successful or not [00:47:54].
Transparency: Oscar Health actively publishes its research and insights on its ai.hioscar.com website to foster open learning and sharing within the industry [01:01:50].

Desired Tooling and Vision

Key tooling needs include:

Safety Layer: A mechanism to verify if LLM outputs are insulting or inappropriate before reaching end-users [00:50:44]. Currently, this relies on a human-in-the-loop [00:50:58].
Faster Inference Times: Enabling rapid parallel generation of multiple outputs, allowing selection of the best one [00:51:27].

Mario Scher envisions AI democratizing analytics within organizations, enabling more people to interact with models like GPT-4 [00:26:39]. Beyond healthcare, he is interested in using LLMs for creative applications, such as generating regulatory reports, creating internal company RPGs (role-playing games) based on company documents, or developing dynamic game mechanics that balance economies [01:02:19].

Tubegraph

Explorer

Table of Contents