From: redpointai
Synchronous speech translation, or real-time voice translation, is considered the “next frontier” in translation by DeepL’s CEO and founder, Jarek Kutylowski [00:39:11]. While AI-based text translation has significantly changed the language service industry and content consumption, the same level of capability has not yet been achieved for real-time conversations [00:39:31].
Current State and Development
Currently, discussions like podcasts still require participants to speak a common language, such as English [00:39:40]. Major players have released demos of this technology, and DeepL is conducting its own research to bring it to a usable product that can be integrated everywhere [00:39:56].
Perfecting synchronous speech translation will take a few years, as spoken language is significantly more complex than text [00:45:26]. Unlike text, which is chunked into sentences, spoken language is stream-based and often unstructured or “careless,” posing a greater challenge for AI models to interpret and translate accurately [00:45:42]. Key challenges include latency and the inherent ambiguity of speech [00:46:01].
Future Implications
The widespread availability of real-time, ubiquitous speech translation models is expected to fundamentally change how the world operates, particularly in the business sector [00:40:28].
Impact on Business and Global Operations
Synchronous speech translation could eliminate geographical limitations in business operations, allowing co-founders or team members to be located in different countries and still communicate seamlessly in their native languages [00:40:48]. For multinational companies, it would significantly improve internal communication and efficiency by allowing employees to access education, learning resources, and knowledge across different languages with much greater ease [00:41:06]. AI is anticipated to bridge these communication gaps entirely [00:41:25].
Impact on Language Learning and Culture
While AI will bridge many communication gaps, the personal connection and cultural aspects associated with learning languages will continue to hold significant value [00:41:30]. The ability of AI to provide fluent conversations with models could democratize access to language learning, making it less expensive than traditional in-person teachers [00:47:45]. However, the enjoyment of learning a language through real-life social interactions, like dinner with a native speaker, may still be preferred over speaking to a phone [00:48:19].
It is predicted that the average person will likely learn fewer languages in the future due to advanced translation capabilities [00:43:34]. However, those who do learn languages will do so out of personal interest, akin to playing chess, as languages are core to human beings and offer an intellectual challenge [00:43:40]. The importance of local languages may even increase as common ones become easily accessible through translation [00:42:11].