Challenges and opportunities of AI integration in healthcare

From: redpointai

AI integration in healthcare presents a unique set of challenges and opportunities, primarily due to the industry’s complex regulatory environment and the nature of medical information. Oscar Health, a $3 billion public health insurance company, operates at the forefront of this intersection, leveraging AI for both administrative efficiency and clinical improvements [00:00:17].

AI’s Transformative Potential in Healthcare

Large Language Models (LLMs) are particularly well-suited for healthcare due to their ability to translate between informal and formal language [01:59:37]. Healthcare uniquely possesses both:

Highly Formalized Language: Such as ICD10 codes, CPT codes, and utilization management guidelines [02:04:02].
Abundant Human Language: Including conversations between providers and patients, and electronic medical record notes [02:15:20].

This capability marks a significant shift from previous algorithmic applications in healthcare, which often “ended at that surface” where formal and informal languages met [02:46:58]. The aim is to create a healthcare system where costs are transparent, alternatives are clear, and administrative issues like claims denials are minimized [04:04:33].

Oscar Health’s AI Applications and Strategy

Oscar Health is both an insurer and a care provider, with its own medical group and care teams [00:31:07]. Their AI strategy focuses on three key financial levers [10:24:00]:

Growth and Retention: AI is used for personalized outbound campaigns to remind members of Oscar’s benefits and encourage re-enrollment [13:37:00]. This includes tailoring messages based on member personas, such as using “empathy” for chronically ill members and “convenience” for generally healthy ones [14:48:00]. LLMs can extract persona characteristics from customer service conversations [15:51:00].
Operational Efficiency (Administrative): Oscar has focused heavily on administrative AI use cases in the short term, aiming for rapid deployment and savings [05:20:00].
- Call Summarization: Increasingly phasing out manual note-taking by care guides in favor of LLM-generated summaries [17:20:00].
- Lab Test Summarization: Launched within Oscar’s medical group [17:41:43].
- Secure Messaging Medical Records Generation: Also launched in the medical group [17:46:00].
- Claims Explainers: Translating complex internal claims logic into understandable language for care guides, eventually aiming for direct member communication [17:52:00].
Medical Cost Reduction and Outcomes Improvement (Clinical): This area is seen as having the greatest long-term impact on healthcare costs [12:26:00]. A primary use case is enabling doctors and customer service agents to “talk to the medical records” [18:52:00].

Challenges in AI Adoption and Deployment in Healthcare

Integrating AI into healthcare is fraught with specific obstacles:

Regulatory and Compliance Hurdles

HIPAA and BAAs: Strict patient privacy regulations like HIPAA require Business Associate Agreements (BAAs) with AI vendors [20:56:00]. Oscar was among the first to sign a BAA directly with OpenAI [21:18:00].
New Model Adoption Delays: When new AI models are released (e.g., Google’s Gemini Ultra), they are not immediately covered under existing HIPAA agreements. This necessitates using synthetic or anonymized test data for initial evaluations, with a typical delay of 3-4 months before real medical data can be used [22:30:00].
Security Reviews and Certifications: Hospitals and insurance companies require extensive security and policy reviews, often involving long checklists or certifications like “High Trust” [23:58:00].

LLM Limitations and Data Cleanliness

Contextual Knowledge Gaps: LLMs struggle with subtle contextual knowledge that humans possess (e.g., a doctor remembering a previous conversation not in formal records), making it an “unfair playing field” when inputs are less clean [07:11:00].
Complex Logical Tasks: LLMs can fail at tasks requiring simultaneous processing and counting, such as categorizing and then tallying reasons for customer service calls [27:48:00]. This points to a “fundamental limitation of LLMs” related to their internal “layers” or processing capacity [28:20:00].
False Positives: LLMs can generate high false positive rates for medically specific questions (e.g., “post-traumatic injury”) because their training data includes broader, layperson understandings of concepts, not just precise medical definitions [31:49:00].

Healthcare System Inertia

Slow Prototyping: Health systems and insurance companies are “not good at rapidly prototyping things” [25:01:00].
Enterprise Sales Focus: In health tech, “the best products in healthcare unfortunately do not win; it really still is the best enterprise sales processes that win” [25:29:00]. This means founders often need to prioritize client trust and engagement over product refinement [25:42:00].

Physical vs. Virtual Care and Business Models

“Leakage” from Virtual Care: While about two-thirds of claims could theoretically be handled virtually, physical interactions (e.g., lab tests, foot exams for diabetics) remain necessary [57:52:00]. This need for in-person care can “sidetrack” patients away from digital doctor interactions, limiting the full replacement of physicians by AI [59:29:00].
Misaligned Incentives: Large health systems often lack financial incentive to transition to lower-cost care channels, as this could lead to reduced reimbursement from insurers and the government [59:50:00].

Safety and Inference

Safety Layer: A significant challenge is ensuring AI outputs are safe and not “insulting” or inaccurate, often requiring a human-in-the-loop for dangerous use cases [50:44:00].
Faster Inference Times: The desire for faster inference times and the ability to run multiple AI outputs in parallel for verification (e.g., generating a thousand code listings and picking the best) is a key need for improved AI applications [51:27:00].

Opportunities in AI Model Development and Infrastructure

Despite challenges, AI offers significant opportunities:

Underhyped Areas

Voice Outputs: The potential for voice outputs in healthcare AI is seen as underhyped, offering quick progress as long as it’s not used for clinical advice [01:00:52].
Regulatory Filings Composition: Automating the generation of regulatory documentation for entities like the NCQA or State Health Departments could significantly reduce overhead [53:31:00].
Fraud, Waste, and Abuse Detection: This area is ripe for disruption, as it is currently dominated by “very old school players” making substantial profits [56:16:00].

AI Doctor (Long-term Vision)

The concept of an “AI doctor” is seen as inevitable, given that medical knowledge is highly “regimented,” “algorithmic,” and based on “existing knowledge” and “inference based on concrete data points” [56:59:00].

Development Strategies

Chain of Thought: Chaining LLMs to each other, effectively expanding their “layer space,” helps solve complex tasks by breaking them into multiple steps [29:52:00].
Self-Consistency Questionnaires: Prompting an LLM to generate its own ways of interpreting data (e.g., 30 different ways a post-traumatic injury might appear in medical records) and then evaluating these methods independently can significantly improve accuracy [33:38:00].
Hierarchical Prompting: For complex systems like claims processing, providing LLMs with traces at a high level of hierarchy and allowing them to “double click” into specific functions for more detail is effective [38:20:00].
General Purpose vs. Specialized Models: While specialized healthcare-specific models exist (e.g., Google’s Med-PaLM), general-purpose models like GPT-4 often maintain better “alignment” and follow instructions more reliably, especially for formatting requirements like JSON [43:40:00]. The current best approach is to use the biggest general model for reasoning and potentially combine it with Retrieval Augmented Generation (RAG) and fine-tuning, as both methods offer independent performance improvements [45:09:00].

Enterprise AI Adoption and Team Structure

Oscar Health employs a hybrid model for AI development, fostering both centralization and decentralization:

Centralized “Pod”: A 7-person team (2 product managers, data scientists, engineers) holds weekly office hours to assist other teams with AI queries and maintains three core projects [47:10:00]. They also track all AI projects across the company [50:05:00].
Weekly Hacking Sessions: These informal sessions encourage sharing of ideas, successes, and failures, lowering the barrier for experimentation across the company [47:54:00].

This structure facilitates learning and rapid iteration, as much of the effective prompting and systems design knowledge is gained through hands-on experimentation rather than formal literature [40:48:00].

Overhyped vs. Underhyped AI in Healthcare

Overhyped: Clinical chatbots, generally [01:00:46].
Underhyped: Voice outputs [01:00:52].

To learn more about Oscar’s AI work, visit hi.oscar.com or follow Mario Scher on Twitter (@MarioTS) [01:01:49].

Tubegraph

Explorer

Table of Contents