From: redpointai

Connor Wick, CEO of Speak.com, an English language learning platform, discusses the profound shift AI is bringing to education and how it compares to traditional learning approaches. Speak.com, backed by OpenAI, launched in South Korea in 2019 and has grown to over 10 million users across more than 40 countries, raising a $500 million valuation [00:00:11].

From Flashcards to Omniscient Tutors

Connor Wick’s entrepreneurial journey began in high school with a flashcard app that replicated physical index cards for studying on the iPhone [01:00:57]. This app was popular, with millions of users creating hundreds of millions of decks and billions of cards [01:30:42]. His early vision was to aggregate this knowledge into a graph to generate and teach anything, creating an “omniscient tutor” [02:06:08]. This concept, initially conceived around 2013-2014, is now possible with current AI technology [02:16:08]. While flashcard data is structured and good for learning, large language models (LLMs) have achieved greater breadth by crawling the entire internet [03:02:16].

Wick’s formal exposure to AI began around 2015, where he “crashed” a Berkeley course and became convinced of the potential for underlying models to improve [04:00:57]. Early focuses included recurrent neural networks (RNNs) and convolutional neural networks (CNNs) [04:32:00].

Speak: An AI-Powered Language Fluency Solution

Speak aims to be a full fluency solution for language learning, focusing on speaking and real conversations, as opposed to traditional methods like grammar, memorization, or vocabulary/flashcard focus [06:30:00]. The methodology teaches high-frequency word chunks and encourages repetitive practice for automatic recall in simulated conversations [07:06:00]. The experience is highly individualized, adapting to the user’s motivation, interests, and proficiency level [07:45:00].

The company’s long-term orientation anticipated technological advancements, believing that with more data and compute, models would surpass humans in various tasks, eventually allowing AI to fully replace the human in the learning process [08:21:00]. This vision meant making product decisions aligned with future capabilities, iterating product evolution “one step at a time” [09:00:00]. Early unlocks included accurate speech recognition for a good learning experience [09:28:00].

Technological Moats and Challenges

Speak’s technological defensibility comes from several areas:

  • Specialized Models: Developing in-house models for niche tasks, such as speech recognition optimized for accents and detecting specific pronunciation mistakes, which are faster and more reliable than generalized models [12:05:00].
  • ML Scaffolding: A significant investment is made in the “AI firmware” or “ML scaffolding” – the complex technology for orchestrating models, ensuring they work well with the backend and product, continuous data collection, fine-tuning, and robust evaluation [15:20:00]. This is considered a larger and more significant long-term technological moat than core modeling [15:53:00].
  • End-to-End Experience: Solving the problem for customers through a seamless, integrated experience [10:55:00].

Challenges include:

  • Prompt Optimization: The current need for prompt optimization, such as telling the AI to “pretend you’re very friendly,” is seen as a “silly” and temporary aspect of AI development [17:42:00].
  • User Education for New Interface Paradigms: Designing intuitive audio-first experiences is challenging because talking to technology is fundamentally unfamiliar for many users [19:02:02]. However, the widespread use of apps like ChatGPT is rapidly increasing user familiarity with these paradigms [0:20:25].
  • Evaluation: Distilling perfect evaluations for open-ended AI tasks, especially in speech, goes beyond simple metrics like word error rate to include nuances like individual mistake detection and understanding “unintelligible” speech [30:57:00].

Future of AI in Education

The future UI of learning apps will likely be fluid and “hybrid,” allowing users to talk, type, or tap at any point [20:58:00]. Speech-to-speech models will improve, offering more natural and lower-latency interactions [21:27:00].

A “profound shift” will involve AI “thinking about you in the background” – observing user data and proactively preparing personalized insights or lessons (e.g., running overnight computations to distill daily lessons) [22:56:00].

Curriculum design in AI-powered learning will balance structured paths (e.g., learning high-frequency words first) with deep personalization [25:22:00]. Humans will remain “in the loop” for high-level strategy and methodology, but machine learning teams will increasingly contribute to curriculum creation [26:01:00]. The science fiction book “Diamond Age,” featuring an AI-powered “all-encompassing primer” that teaches anything, serves as an inspiration for the highly individualized learning experience [27:06:00].

AI’s Impact on Learning Products and Industries

While AI can be a “sustaining technology” for incumbents by improving existing solutions, it becomes “highly disruptive” when it fundamentally changes how a problem is solved, such as full automation [35:02:00].

In language learning, Speak’s focus on conversational fluency for users who previously lacked access to human speakers differs from Duolingo’s approach as a casual “brain training app” for new learners [36:06:00]. AI “clearly helps” Speak’s use case, enabling a solution for those seeking conversational fluency [37:42:00]. While real-time translation might obviate some basic needs for tourists, it does not address the fundamental desire for “human connection” and fluency that motivates Speak’s users [38:08:00].

The rise of general AI tools like ChatGPT is seen as a positive development for specialized AI language learning products. As more people experiment with ChatGPT for language learning, they may realize the potential of AI in this domain and then seek out more specialized, effective solutions for serious language acquisition [40:35:00]. This “rising tide” of AI familiarity is expected to increase the number of people using AI to learn languages [41:35:00].

Other areas of expansion for AI in education include:

  • Professional Skills: Building products for businesses to certify and assess employees’ professional skills, such as giving presentations in English [48:20:00].
  • Schools: Integrating AI into general school learning environments [50:18:00].
  • Personal Learning: This is identified as a massive, often “invisible” sector that will undergo significant transformation. Activities like reading books, listening to podcasts, and watching YouTube videos are all forms of personal learning driven by the desire to “become a better version of yourself” [50:35:00]. This future learning will be highly individuated, with AI possessing long-term memory and understanding user interests and personality to provide relevant information [52:05:00].

Timeline and Challenges

The timeline for this transformation is perceived as gradual in the short term but profound over a decade [55:22:00]. While AI is “going to change everything,” the educational sector has seen little fundamental change in efficacy despite the adoption of technology like Chromebooks [53:35:00]. The shift from traditional learning methods (e.g., paper quizzes, one-to-many teaching) to an AI-driven, highly personalized approach is expected to be one of the “biggest and most exciting areas of change and disruption” [53:57:00].

The impact of AI on teachers and students will be significant, moving beyond simple homework help. While AI can improve math instruction, the “bar” for adoption is higher than for language learning because existing solutions for subjects like math are already effective [56:49:00]. Language learning’s traditional one-to-one teacher model, which is highly effective but hard to scale, creates a greater “delta of advocacy” for AI-powered solutions [56:54:00].

The biggest surprise in building AI features is that new technologies are “never as good as you think it will be,” and “building something that will actually change user behavior is really, really hard” [01:00:39]. This involves a continuous loop of learning and adaptation, as seen when human-level transcription combined with GPT-4 for open-ended lessons was “good but wasn’t a game changer” [01:01:08]. The ultimate challenge often lies not in technology, but in product and market fit [01:00:46].