AI advancements in healthcare by Oscar Health

From: redpointai

Oscar Health, a $3 billion public health insurance company, has been at the forefront of innovating in healthcare technology for the past decade [00:00:18]. As both an insurer and a care provider, Oscar Health is actively experimenting with models like GPT-4 in various healthcare contexts [00:00:31]. They were among the first to gain regulatory approval from OpenAI to work in healthcare [00:00:38].

The Future of AI in Healthcare

Mario Schaer, CTO and co-founder of Oscar Health, believes that large language models (LLMs) are exceptionally good at translating between informal and formal language [00:01:40]. Healthcare, uniquely, possesses an abundance of both: highly formalized language (e.g., ICD-10 codes, CPT codes, utilization management guidelines) and highly human, informal language (e.g., conversations between providers and patients, electronic medical record notes) [00:02:02]. This capability of LLMs can bridge the gap that traditional algorithmic approaches have historically struggled with in healthcare [00:02:25].

Schaer anticipates that the next five years of healthcare will see administrative processes become real-time, bidirectional, and more transparent, driven by LLM capabilities [00:03:51]. This would allow patients to know costs upfront, understand alternatives, and avoid issues like denied claims or authorization delays [00:04:06]. Longer term, the goal is to replace caregiver and clinical intelligence with machine intelligence, significantly reducing costs and replacing specialists with AI [00:04:30].

Oscar Health’s AI Implementation

Oscar Health sequences its AI implementation, prioritizing administrative use cases before clinical ones [00:05:06]. Currently, three out of their top four use cases are administrative, with one clinical [00:05:11].

Oscar Health’s financial outcomes are driven by three main levers [00:10:24]:

Growth and Retention: Improving member acquisition and retention [00:10:36].
Operational Efficiency: Streamlining administrative processes [00:10:46].
Clinical Cost Reduction and Outcome Improvement: Directly impacting medical costs and patient health [00:10:52].

Key Use Cases and Applications

Retention Campaigns: Oscar Health utilizes LLMs to personalize outbound campaigns during the annual six-week re-enrollment period [00:13:08]. This involves reminding members of beneficial services like colorectal cancer screenings, with messaging tailored to specific demographics (e.g., Asian-Americans) [00:14:07]. They also differentiate messaging for chronically ill members (emphasizing convenience) versus generally healthy members (emphasizing empathy) [00:14:51].
Ethnicity Information Capture: LLMs help fill in missing ethnicity data for members by analyzing names or detected languages in conversations, enhancing personalization [00:16:33].
Call Summarization: LLMs are increasingly taking over manual note-taking for customer service calls, phasing out human intervention [00:17:20].
Lab Test Summarization: Implemented in the Oscar Medical Group [00:17:41].
Secure Messaging Medical Records Generation: Also launched in the Oscar Medical Group [00:17:46].
Claims Explainers: Providing internal care guides with clearer explanations of complex claim payment logic, translating formal rule traces into informal language [00:17:52]. The goal is to explain why a claim was denied, or even why a specific amount was paid [00:35:40].
Medical Record Interaction: Enabling doctors, customer service agents, and other internal staff to “talk to” medical records [00:18:52].

Challenges and Learnings in AI Deployment

Regulatory Hurdles

Oscar Health, as a highly regulated insurance company, was fortunate to have a strong compliance foundation [00:20:32]. A major constraint is HIPAA, which requires business associate agreements (BAAs) with AI providers like OpenAI [00:20:55]. Oscar was reportedly the first organization to sign a BAA directly with OpenAI [00:21:18]. New models from providers (e.g., Google’s Gemini Ultra) are typically not immediately under existing HIPAA agreements, necessitating a wait of three to four months before real medical data can be used [00:23:04]. During this period, Oscar uses synthetic or anonymized test data [00:22:50]. For healthcare companies building their own models, security and policy reviews, often involving long checklists from hospitals, are necessary, with certifications like “High Trust” easing the process [00:23:57].

LLM Limitations and Solutions

Counting and Multi-step Reasoning: LLMs struggle with tasks requiring multiple, distinct reasoning steps, such as categorizing and then counting different types of customer service calls based on a self-generated taxonomy [00:27:30]. This is attributed to limitations in their “layers” or computational depth for complex, sequential tasks [00:28:25].
- Solution: Employing “Chain of Thought” prompting, which effectively chains multiple LLMs or prompts together to break down a complex task into smaller, manageable steps, overcoming the “token pollution” issue where too much information in a single pass confuses the model [00:29:52].
Contextual Knowledge: LLMs lack the subtle contextual knowledge that human providers possess (e.g., remembering a previous conversation not captured in formal records, or understanding local geographical nuances like public transport options) [00:07:17].
- Solution: The focus must be on improving the “horizon of knowledge” or the breadth of input provided to the LLM [00:08:07].
False Positives in Medical Extraction: A notable challenge was accurately identifying “post-traumatic injury” from medical records for utilization management, with LLMs producing a high rate of false positives [00:31:49]. This is because the LLM’s training data contains diverse, layperson associations with the term, which do not align with its strict medical definition within utilization management guidelines [00:33:01].
- Solution: Implementing a “self-consistency questionnaire” where the LLM is prompted to generate 30 different ways a concept (e.g., post-traumatic injury) might appear in medical records. These generated examples are then used in the prompt to guide the LLM’s evaluation in a multi-step process, improving accuracy [00:33:43].

Prompting Strategies

Oscar’s approach to prompting is largely empirical, with 90% of strategies developed through trial and error [00:39:48]. Emphasis is placed on “systems design” – how LLM calls are chained together in a logical sequence – rather than just individual prompts [00:40:30].

LLM Selection Strategy

Oscar Health prefers general-purpose models like GPT-4 over healthcare-specific models (e.g., Google’s Med-PaLM) [00:43:06]. The reason is that specialized models tend to “lose alignment,” meaning they struggle to follow simple instructions, such as outputting information in JSON format [00:44:03]. This loss of instruction-following ability outweighs the benefits of specialized training data [00:44:20]. Until symbolic processing (planning/reasoning) can be decoupled from content generation in LLMs, bigger general models are preferred, often combined with Retrieval Augmented Generation (RAG) and fine-tuning [00:44:41].

AI Team Structure

Oscar Health has developed a successful model for its AI team structure [00:46:20]:

Hackathon Origins: The current structure evolved from a highly popular AI hackathon, highlighting the need for sharing and discussion around prompting strategies [00:46:23].
Centralized “Pod” Team: A seven-person team (two product managers, data scientists, engineers) acts as a central resource [00:47:10].
- Office Hours: They hold weekly office hours for any employee to get feedback on their AI prompts or ideas [00:47:22].
- Dedicated Projects: The Pod also has its own three core projects to complete, ensuring tangible output [00:47:34].
- Weekly Hacking Sessions: Monday night sessions are open to anyone in the company to share AI ideas, successful projects, or even failures, fostering an environment where trying and discussing are encouraged [00:47:54].
Decentralized AI Projects: Various teams across the company work on their own AI initiatives, and the Pod helps track and share these [00:50:05]. This hybrid approach balances centralized guidance with decentralized experimentation.

Future Opportunities and Over/Underhyped Areas in Healthcare AI

Overhyped

Clinical Chatbots (generally): Currently, they are overhyped due to safety concerns (hallucinations, biases), the need for physical interaction in many medical scenarios, and adverse business models in the healthcare system [00:49:00], [00:56:56], [01:00:49].

Underhyped

Voice Outputs: This area has significant potential for rapid advancement, particularly for non-clinical applications [01:00:55].

Commercial Opportunities

Regulatory Filings Composition: Automating the generation of complex regulatory documentation (e.g., for state regulators, NCQA, or even internal SOX compliance) using LLMs that can watch data flows [00:53:35].
Fraud, Waste, and Abuse: This industry segment is still dominated by older, expensive players and is ripe for AI disruption to reduce overpayment [00:56:16].
Prior Authorization: While many companies are entering this space, it’s considered very close to the core competency of insurance companies. External solutions might struggle to achieve significant impact unless they offer highly interactive and platform-like integration, as insurers prefer to manage clinical management themselves [00:55:20].

Virtual Healthcare

Even though approximately two-thirds of claims could theoretically be handled virtually (not two-thirds of people), significant barriers remain:

Safety: Direct LLM-to-patient interaction is difficult due to the risk of hallucinations and biases [00:57:42].
Physical Interaction: The inability to perform virtual lab tests or hands-on exams creates “leakage,” where patients need to seek in-person care, disrupting continuous virtual engagement [00:59:36]. This also impacts patient loyalty to specific primary care physicians [00:59:09].
Business Models: Large health systems lack incentive to transition to lower-cost virtual care channels, as it can lead to reduced reimbursement and capacity [00:59:50]. Insurers are better positioned to deploy virtual primary care but often lack member engagement to do so effectively [01:00:15].

Learning More

To learn more about Oscar Health’s AI work and insights, visit hi.oscar.com [01:01:50]. Mario Schaer also posts his explorations and Oscar’s AI initiatives on Twitter @MarioTS [01:02:07]. He is also exploring AI applications in gaming, including generating RPGs from company documents and creating games that dynamically add mechanics using LLMs [01:02:19], [01:03:13].

Tubegraph

Explorer

Table of Contents