Specialized vs General AI models in translation

From: redpointai

DeepL, a company valued at $2 billion and supporting over 100,000 businesses worldwide in AI translation, has a distinctive perspective on the role of specialized models versus generalized models in the field of AI translation [00:00:00]. CEO and founder Yerkoisky emphasizes that while general models capture much attention, the real value is created through specialized models [00:48:52].

DeepL’s Approach: Specialization and Vertical Integration

DeepL has been engaged in cutting-edge AI research long before it became mainstream [00:41:43]. Their approach to building AI models is very much a “build it yourself” philosophy [00:59:56], stemming from a time when necessary tooling, models, and data centers were not readily available [01:06:00].

This strategy involves:

Specialization on Use Cases: DeepL specializes in high-value translation use cases for businesses [00:08:34]. They believe that for “super big use cases” like translation, specialized models make significant sense [00:37:11].
Ownership of the Vertical Stack: Having ownership of the entire vertical stack—from go-to-market strategy and product to engineering and research—allows DeepL to effectively identify and solve problems that might be missed with mere prompt engineering [01:05:06]. This control over model parameters, training, and architecture enables them to integrate features like custom terminology embedding, which other translation providers have struggled to implement effectively [01:41:38].
Tight Feedback Loops: The combination of in-house research, model building, and application deployment creates a tight feedback loop, leading to better and more tailored products [01:15:07].
Language-Specific Models: DeepL runs different sets of models depending on languages and language pairs, adjusting for available data sizes (e.g., more data for German-English than Polish-English) [01:45:48]. Sometimes, models are grouped by language similarity or bundled for operational efficiency on their infrastructure [01:53:07]. For inference compute, it often makes more sense to have slightly smaller models handling individual language pairs [02:06:00].

Advantages Over General Models

DeepL attributes its success against giants like Google Translate to a strong focus on academic-level research combined with specialization [00:25:00].

Key benefits of specialized models include:

Accuracy and Quality: For high-value use cases in businesses, accuracy and quality are paramount [00:06:03]. Specialized models are better able to maintain a high and steady quality, especially when augmented by human translators for quality assurance and training data annotation [01:31:00].
Tailored Performance: DeepL’s models can infer whether a text is technical (optimizing for accuracy) or marketing (optimizing for native-like fluency) to provide the most appropriate translation [00:40:00].
Efficiency: There is a desire to achieve “more with less compute” [02:47:41], and specialized models can potentially achieve high quality without the brute-force compute of very large general models [02:50:01].
Customer-Centricity: Businesses prefer ready-made, vertically integrated solutions that “just plug it in and it works” rather than needing to perform internal prompt engineering [00:37:21].

Limitations of General Models

While acknowledging the “magic” and increased public awareness brought by large general models like ChatGPT [01:27:00], DeepL views them as “overhyped” compared to specialized models [00:48:46]. They believe that general models, particularly those developed by “big gen players,” may not be incentivized to invest in creating smaller, more efficient models due to their current “monopoly on huge compute” [02:50:00].

In the context of translation, the reasoning capabilities highlighted in new models like OpenAI’s O1, while fundamental to understanding the world, are not necessarily at the core of what makes a translation excellent [02:36:00].

The Future of AI Translation

DeepL believes that the current state of AI translation is very good for well-resourced languages and common use cases like translating newspaper articles or emails [03:00:00]. However, for high-stakes content like marketing websites for billion-dollar companies or operating manuals for nuclear power plants, human oversight remains crucial for quality and accountability [03:17:00]. This highlights the ongoing role and future of human translators in the AI era.

The next frontier for DeepL is synchronous speech translation and voice-based solutions [03:12:00]. This will eventually enable seamless cross-language communication in business and daily life, fostering more global collaboration [03:40:00]. Technical challenges include reducing latency and handling the inherent ambiguity and unstructured nature of spoken language [04:55:00].

Despite advancements, it’s predicted that the average person might learn fewer languages in the future due to improved AI capabilities, but language learning will persist as a personal interest and cultural pursuit [04:34:00]. DeepL also notes the exciting potential of AI in language learning to democratize access to language proficiency, making fluent conversations with models possible and affordable [04:45:00].

In conclusion, DeepL’s experience suggests that while general-purpose large language models are powerful, specialized models, particularly in domains like translation, are critical for delivering the high quality, reliability, and tailored solutions that businesses demand [00:38:00]. The ongoing competition between specialized and general AI models will be fascinating to observe in the coming years [00:27:00].

Tubegraph

Explorer

Table of Contents

Specialized vs General AI models in translation

DeepL’s Approach: Specialization and Vertical Integration

Advantages Over General Models

Limitations of General Models

The Future of AI Translation

Graph View

Backlinks