From: aidotengineer

Superdial is a company that specializes in using voice AI for phone calls, particularly those to insurance companies within the healthcare administration sector [00:02:22]. Their platform aims to automate and streamline these often complex and annoying interactions [00:02:26].

Superdial Platform Capabilities

The Superdial platform offers several key features for businesses:

  • Script Building Customers can design their conversations and outline the questions needed to gather information over the phone [00:02:39].
  • Call Submission Calls can be submitted via CSV, API, or through integrations with existing Electronic Health Record (EHR) software systems [00:02:47].
  • Structured Results Superdial delivers results back to the customer in a structured format within hours or a day [00:02:56].

The Agentic Contract

Superdial operates on an “agentic contract” with its customers, where clients pay for results. They specify who to call and what questions to ask, and Superdial provides the answers [00:03:03].

Internal Agentic Loop

Internally, Superdial employs an agentic loop to manage calls:

  1. Timing Calls The system waits for offices and call centers to open before attempting calls [00:03:17].
  2. Voice Bot Attempts Calls are first attempted using their voice bot [00:03:26].
  3. Human Fallback If the voice bot cannot complete the call after a certain number of attempts, it is seamlessly handed over to a human fallback team [00:03:30]. This human intervention is often inevitable for complex healthcare calls, and the transparency of this process is a benefit to customers, ensuring calls are always completed [00:03:40].
  4. Continuous Learning The system learns from each call, updating office hours for specific phone numbers and improving its phone tree traversal strategies for future interactions [00:04:01].
  5. Auditing Due to the sensitive nature of healthcare calls, a random selection of calls are audited to ensure system functionality [00:04:15].

Example: Prior Authorization Call

An example demonstration showcased a prior authorization call where the bot, after navigating a phone tree, successfully communicated with a human representative to obtain information for a customer [00:04:25]. The speaker noted that a “boring call” is an “excellent call” in this context, as it signifies successful automation of routine, yet critical, tasks [00:06:06].

Impact and Efficiency

The Superdial system has saved over 100,000 hours of human phone and calling time and is projected to save millions more in 2025 [00:06:15]. This achievement was made possible by a lean team of four engineers who built the full-stack web application, EHR integrations, and the voice bot, while simultaneously onboarding new customers and supporting new conversational use cases [00:06:23].

The Role of a Voice AI Engineer in Healthcare

The success of Superdial highlights the unique role of a voice AI engineer. They deal with multimodal data (audio and transcripts), develop real-time applications where latency is critical, and navigate complex asynchronous programming [00:06:55]. The primary product constraint is a voice conversation, requiring high expectations for conversational flow and integration into existing business interactions [00:07:18].

Superdial’s approach to these challenges is guided by two sayings: “Say the right thing at the right time” and “Build this plane while we fly it” [00:07:39]. They focus on customizing scripts for each customer while relying on a horizontal voice AI stack for technical challenges [00:07:54].

Addressing Ethical Considerations and Biases

Given the rapid development of generative AI, ethical considerations and biases are crucial, especially in healthcare applications. Voice AI apps could potentially be biased against people with certain accents or dialects, or create “spooky” realistic but flawed interactions [00:08:54]. The speaker emphasizes that in the absence of stringent AI regulation, the onus is on engineers and leaders to prioritize ethical development [00:09:13]. It’s important to choose tools and infrastructure that promote accessibility and collaboration, allowing a diverse range of stakeholders to be involved in the development process from the start [00:09:42].

Last Mile Problems in Voice AI

Scaling a voice AI system involves overcoming several “last mile” challenges:

Orchestration Frameworks

Superdial found their stride using Pipe Chat, an open-source framework for voice AI orchestration [00:12:39]. Its extensibility and the ability to self-host and scale it were crucial for managing long phone calls (up to 1.5 hours) and features like call transfers [00:12:47].

LLM Integration and Observability

  • OpenAI Endpoint Ownership Superdial chose to own their OpenAI endpoint, which allows them to route to different models based on latency sensitivity [00:13:27].
  • Structured LLM Responses All generative responses are routed through TensorZero, an open-source tool that provides structured and typed LLM endpoints for experimentation in production [00:14:44].
  • Logging and Observability For logging and observability, Superdial self-hosts Lane Fuse. Self-hosting is beneficial for HIPAA compliance, especially with rapid growth in the space [00:14:11]. This allows for anomaly detection, evaluations, and dataset management [00:14:26].

Text-to-Speech System Challenges

A significant challenge involves the text-to-speech (TTS) system, particularly for conveying sensitive information like member IDs (e.g., 12-digit strings) [00:14:36].

  • Pronunciation Control What the LLM outputs and what the TTS engine says may not match the actual recording [00:15:00]. Tools like Rhyme allow for precise phonetic spellings to ensure correct pronunciation of names or specific terms [00:15:22].
  • Spelling and Pauses For long words or sequences, custom functions can be used to control pauses and breaks during spelling [00:15:31].
  • Audio Review Recordings are frequently reviewed to ensure the audio output sounds natural and correct, in addition to checking transcripts [00:15:46].

Mini Last Mile Problems

  • Persona Naming An example given was Superdial’s previous bot name, “Billy,” which caused confusion on calls due to pronunciation [00:16:09]. Dialing in the bot’s persona early is crucial [00:16:24].
  • Avoid Building from Scratch For new projects, leveraging existing tools like Pipe Chat can provide a quick start, allowing focus on unique conversational aspects [00:16:30].
  • Latency Tracking Time to First Byte for each processor is a critical metric to monitor [00:16:40].
  • Upgrade Paths Ensuring high transcription accuracy is vital. Partnerships (e.g., with Deep for speech-to-text) enable fine-tuning models for continuous improvement [00:16:51].
  • Redundant Fallbacks Having fallbacks ready for each part of the stack is essential to prevent system outages (e.g., if OpenAI goes down) [00:17:08].
  • End-to-End Testing For voice AI, end-to-end testing is unique.
    • Fake Phone Numbers Testing with a fake phone number that plays an MP3 file can reveal immediate problems [00:17:42].
    • Simulated Phone Trees Creating a simulated voice tree allows the bot to pseudo-navigate it [00:17:53].
    • Bot-to-Bot Communication Using generative services like Koval and V allows bots to interact with each other [00:18:01].

Key Takeaways for Voice AI Engineers

  • Strategic Stack Choice Wisely choosing your technology stack enables focus on unique conversational experiences [00:18:12].
  • Last Mile Focus Concentrating on the “last mile” challenges provides significant value and ensures agents are effective [00:18:23].
  • Adaptability and Safety Staying updated with new models and integrating them quickly and safely is crucial for success in the rapidly evolving voice AI space [00:18:30].