Exploring use cases and challenges of AI in healthcare systems

From: redpointai

The “Unsupervised Learning” podcast, hosted by Jacob Efron, featured Mario Scher, CTO and co-founder of Oscar Health, a $3 billion public health insurance company. Oscar Health is recognized for its innovation in technology and healthcare over the past decade [00:00:24]. Scher provided insights into implementing AI models for real-world use cases and broader perspectives on the future of AI in healthcare [00:00:44]. Oscar Health operates as both an insurer and a care provider, actively experimenting with GPT-4 in various healthcare contexts [00:00:31]. They were among the first to gain regulatory approval from OpenAI to work in healthcare [00:00:38].

AI’s Impact in Healthcare

Mario Scher believes that Large Language Models (LLMs) excel at transforming informal language into formal language and vice-versa [00:01:40]. Healthcare, with its mix of highly formalized codes (e.g., ICD-10, CPT, utilization management guidelines) and human language (e.g., patient-provider conversations, medical record notes), is an industry where LLMs can have a significant impact [00:02:00].

Historically, healthcare has seen relatively less algorithmic coverage compared to other industries, lacking an equivalent of PageRank [00:02:26]. Existing predictive models, such as logistic regressions, often halted at the interface between formal and informal language [00:02:46]. LLMs, however, are adept at bridging this gap [00:02:55].

In the next five years, AI is expected to revolutionize the administrative side of healthcare, making processes like claims payment, authorizations, and cost transparency real-time and bidirectional [00:03:09]. This could lead to patients knowing costs upfront and understanding care alternatives [00:04:04]. Beyond this, the long-term goal is to replace caregivers and clinical intelligence with machine intelligence, potentially reducing the cost of doctor visits significantly [00:04:30]. This clinical application, however, faces challenges related to hallucinations, safety, and bias [00:04:49].

Oscar Health’s AI Applications

Oscar Health prioritizes AI adoption based on three financial levers:

Growth and Retention: Enhancing member acquisition and retention [00:10:36].
Operational Efficiency: Streamlining administrative processes [00:10:46].
Medical Cost Reduction and Outcomes Improvement: Directly impacting the majority of healthcare costs [00:10:52].

Retention Campaigns

Oscar leverages AI to run outbound campaigns to retain members during the six-week annual enrollment period [00:13:08]. These campaigns personalize messages to remind members of positive experiences with Oscar in the preceding 12 months [00:13:42].

AI helps tailor messaging based on member personas:

Reminding specific demographics (e.g., Asian-Americans) about preventative care like colorectal cancer screenings proved effective [00:14:07].
Chronically ill members respond better to messages emphasizing convenience, while generally healthy members resonate more with empathetic messages [00:14:51].

LLMs, like GPT-4, have been effective “out of the box” for extracting information from customer service conversations to identify patient issues and persona characteristics [00:15:48]. They can also assist in tasks like inferring ethnicity from names or detecting conversation languages to fill in missing member data [00:16:33].

Administrative Use Cases

Oscar is phasing out manual note-taking for customer service calls, relying entirely on LLMs for call summarization [00:17:28]. Other launched administrative applications include:

Lab test summarization in the Oscar Medical Group [00:17:41].
Secure messaging medical records generation [00:17:46].
Claims explainers for internal care guides [00:17:52].

These administrative improvements aim to save “cents PMPM” (per member per month), which can accumulate into significant savings for an insurance company [00:18:01].

Clinical Use Cases

One of the earliest and biggest clinical use cases involves enabling doctors and other Oscar personnel to “talk to the medical records” [00:18:52].

Challenges in AI Adoption and Deployment in Healthcare

Integrating AI into healthcare presents several unique challenges.

Regulatory Hurdles

Healthcare is a highly regulated industry, with HIPAA (Health Insurance Portability and Accountability Act) being a major constraint [00:20:56]. HIPAA restricts the sharing of patient-specific information [00:21:01]. AI vendors, like OpenAI, must sign Business Associate Agreements (BAAs) to handle protected health information [00:21:05]. Oscar Health was the first organization to sign a BAA directly with OpenAI [00:21:17].

New AI models from providers like Google’s Gemini Ultra are not immediately covered under existing HIPAA agreements, requiring a waiting period (typically 3-4 months) before real medical data can be used [00:22:30]. During this period, synthetic or anonymized test data is used [00:22:46].

Data Cleanliness and Contextual Knowledge

In administrative use cases, inputs and outputs are typically very clean and structured (e.g., claims data in EDI format) [00:06:14]. However, in clinical use cases, this is often not the case [00:06:43]. When summarizing conversations between providers and members, human-written summaries often contain subtle contextual knowledge that LLMs cannot access, such as a provider remembering a previous, unrecorded conversation with a patient [00:07:11]. This “horizon of knowledge” needs to be expanded for LLMs to improve performance in open-ended clinical scenarios [00:08:07].

Trust and Prototyping

Hospitals and insurance companies often require extensive security and policy reviews, making rapid prototyping difficult [00:23:57]. Building trust with these organizations is paramount and cannot be outsourced solely to certifications like “High Trust” [00:24:42]. The focus for founders should be on fostering relationships and collaborative testing rather than solely on product tweaking [00:25:23].

LLM Limitations

Even advanced LLMs like GPT-4 exhibit fundamental limitations:

Compositional Tasks: GPT-4 struggles with tasks that require multiple steps of processing and aggregation within a single prompt, such as categorizing and then counting different types of customer service calls [00:27:35]. This is believed to be due to how Transformers process information across layers [00:28:21].
False Positives: LLMs can yield high false positives on seemingly simple binary questions, like whether a member has a “post-traumatic injury,” because the term has a specific, formal definition in a clinical context that differs from its common understanding in the training data [00:31:49].

Business Model Disincentives

For large health systems, there’s often little incentive to switch to lower-cost care delivery channels (like virtual care) because it can lead to pressure from insurers and government to reduce reimbursement costs [00:59:50]. This “conundrum of healthcare” creates a barrier to broader adoption of cost-saving AI solutions [01:00:24].

Strategies for AI Implementation

Prompting Strategies

To overcome LLM limitations, strategies like “chain of thought” prompting are used, effectively chaining LLMs together to expand their “layer space” and manage complex tasks [00:29:52]. For tasks like medical record extraction, a “self-consistency questionnaire” approach can be employed [00:33:41]. This involves prompting the LLM to generate multiple ways a concept might appear in medical records, then independently evaluating each, and integrating the findings [00:33:52].

For complex claim explanations, Oscar provides LLMs with hierarchical traces of their internal rule base, allowing the model to “double-click” on specific functions for more detail rather than processing the entire thousand-line logic at once [00:38:18]. This helps manage context window limitations and improves accuracy [00:37:09].

Model Choice

Oscar has consistently found that general-purpose models like GPT-4, despite not being specialized for healthcare, outperform specialized models (like Google’s Med-PaLM) [00:44:00]. Specialized models tend to “lose alignment,” struggling with basic instructions like generating output in JSON format [00:44:11]. Until symbolic processing (planning) can be decoupled from content generation in LLMs, the biggest general-purpose model is preferred for better reasoning [00:45:09].

RAG and Fine-tuning

A recent paper suggests that combining Retrieval Augmented Generation (RAG) and fine-tuning yields independent improvements in performance, indicating that both strategies are beneficial for LLM applications [00:45:15].

Structuring AI Teams

Oscar Health’s approach to structuring AI teams originated from a hackathon [00:46:26]. They established a dedicated “Pod” of seven people (product managers, data scientists, engineers) [00:47:10]. This Pod provides:

Office Hours: Weekly sessions where anyone in the company can bring their AI prompts and ideas for feedback [00:47:22].
Dedicated Projects: The Pod has its own set of three priority projects to ensure practical application and completion, avoiding endless research [00:47:34].
Weekly Hacking Sessions: Open sessions where anyone can share their AI projects, fostering a culture of experimentation and knowledge sharing [00:47:54].

This mixed model, with centralized support and decentralized implementation, encourages experimentation and lowers the bar for participation across the company [00:50:10].

Overhyped and Underhyped Healthcare AI

Overhyped: Clinical chatbots generally [01:00:49].
Underhyped: Voice outputs [01:00:56].

Future Opportunities in Healthcare AI

If starting a new healthcare AI company, Mario Scher suggests focusing on a very obscure niche where technical solutions can solve specific issues for non-technical users [00:53:22]. An example is the composition of regulatory filings [00:53:37]. LLMs could watch data flow and automatically generate regulatory reports, including clinical reports for organizations like NCQA or state health departments, which are often in natural language format [00:54:11].

Regarding prior authorization companies, while seemingly interesting, Scher advises caution as this is a core competency for insurance companies, and external solutions might face limitations if not deeply integrated and interactive [00:55:59].

A more promising area for disruption is fraud, waste, and abuse within the industry, which is currently dominated by older, expensive players [00:56:16].

The “AI Doctor” Future

Mario Scher sees no reason why an “AI doctor” future shouldn’t materialize [00:56:59]. The medical profession is highly computerized and algorithmic, relying on existing knowledge and inference from concrete data points, making it well-suited for LLM applications [00:57:12].

However, several practical issues remain:

Safety: Currently, it is very difficult for LLMs to directly interact with end-users in a clinical context due to safety concerns [00:57:42]. Outputs must be vetted by a human-in-the-loop [00:51:00].
Physical Interaction: While two-thirds of claims could be handled virtually, many patients (e.g., diabetics) still require in-person care for specific procedures like foot exams [00:58:16]. This need for physical interaction creates “leakage” that prevents clinical chatbots from fully replacing physicians or transforming the system on a larger scale [00:59:36].
Business Model: Large health systems often lack financial incentives to adopt lower-cost virtual care channels, as it could lead to reduced reimbursement rates [00:59:50]. Insurers are better positioned to deploy automated virtual primary care but struggle with member engagement [01:00:15].

Learn More

To learn more about Mario Scher and Oscar Health’s AI work, visit hi.oscar.com for articles, insights, and papers [01:01:50]. Mario also shares his explorations and Oscar’s projects on Twitter (@MarioTS) [01:02:07].

Tubegraph

Explorer

Table of Contents