From: redpointai
DeepL, an AI translation company with a $2 billion valuation, supports over 100,000 businesses worldwide [00:00:00]. CEO and founder Jarek Kutylowski discusses the challenges and opportunities in AI model development and infrastructure, drawing from DeepL’s extensive experience in cutting-edge AI research prior to the widespread adoption of large language models (LLMs) [00:00:41].
Evolution of AI Models and Infrastructure
The release of models like ChatGPT significantly increased public awareness of AI’s capabilities [00:01:34]. DeepL had been building language models as part of their translation solutions for a while, so the academic surprise wasn’t significant, but the “magic” aspect and increased public understanding were notable [00:01:15]. The advent of LLM technology allows DeepL to bring the next level of translation to users, enabling more interactivity between humans and AI [00:03:27].
Specialized vs. General Models
A key perspective from DeepL is the emphasis on specialized models over generalized ones [00:00:12]. For high-value use cases, particularly for businesses, specialized models prove to be more effective [00:06:03]. While general models are useful, there are many areas where building specialized solutions is financially impractical, especially for long-tail use cases where companies cannot afford the significant compute resources for training [00:36:34]. However, for critical applications like translation, specialized models are seen as making a lot of sense [00:37:14].
DeepL aims for models that are smarter and require less compute, focusing on architectural improvements similar to the impact of the Transformer architecture [00:25:10]. While large general AI players might not be incentivized to invest in smaller, more efficient models due to their “monopoly on huge compute,” it falls to specialized players like DeepL to innovate in this area [00:25:32].
Vertical Integration and Full Stack Ownership
DeepL has adopted a “build it yourself” philosophy, owning the entire vertical stack from product and go-to-market to engineering and research [00:09:59]. This comprehensive ownership allows for better identification and resolution of problems that might be missed with prompt engineering alone. Having control over model parameters, training, and architecture enables more effective problem-solving for customers [00:10:25]. An example of this benefit is DeepL’s ability to allow customers to embed terminology into models, a crucial feature for businesses that other translation providers have struggled to integrate effectively [00:11:38].
Challenges in AI Model Training and Scalability
Data Scarcity and Language Pairs
A significant challenge in AI model training and scalability arises from the varying sizes of available datasets for different language pairs [00:12:53]. For example, there’s much more translated material for German-English than Polish-English [00:13:01]. This disparity means that model sizes and architectures might differ, and it can be more efficient from an inference compute perspective to use smaller models optimized for individual language pairs [00:14:04].
Data Labeling and Human Input
The influence of human data has consistently risen in AI development and is expected to become even more crucial [00:14:48]. DeepL has run large-scale data annotation projects internally for years, utilizing human translators to train models and ensure quality assurance [00:15:04]. This in-house approach ensures top-notch quality, which is vital for specialized models where customer expectations for consistent quality are high [00:15:31]. While DeepL is considering outsourcing parts of this process, the decision hinges on the level of control and quality required for specific tasks [00:17:10].
Innovation and Iteration
Being an innovative, cutting-edge company means embracing the process of “throwing away a lot of results” [00:20:32]. This can happen due to competition or internal research revealing better approaches. Even failed attempts in creating new model architectures provide valuable understanding of the problem, contributing to overall progress [00:21:06]. For instance, DeepL experimented with building custom models for individual customers, a historical practice in the translation industry, but found that the overall quality of out-of-the-box translation models has surpassed any gains from such bespoke solutions [00:22:11]. This indicates that constant evaluation and adaptation are necessary in a rapidly evolving field.
Infrastructure and GPU Compute
Data Centers vs. Hyperscalers
DeepL has operated its own data centers since its inception, largely because no other viable options were available at the time [00:27:07]. While hyperscalers are excellent for starting and can provide a good kickstart, for companies reaching DeepL’s scale, running their own data centers offers significant cost advantages and hardware availability for the newest GPU technology [00:29:15]. This ensures faster access to cutting-edge hardware, which is crucial for staying competitive [00:29:20].
However, operating in-house infrastructure is more complex and can slow down development [00:29:49]. DeepL is transitioning large parts of its stack towards hybrid cloud solutions, only keeping critical operations (for efficiency or security reasons) on-premise in their own data centers [00:29:54].
Scarcity of GPU Compute
A major challenge is the scarcity of GPUs and GPU-like solutions [00:26:23]. The tooling and platforms for GPU compute are still in their early stages. Unlike general-purpose CPU computing, where abstraction layers don’t incur significant costs, GPU compute is scarce, powerful, and expensive. This necessitates optimizing operations for sustainability, both environmentally and commercially [00:28:16].
The Future of AI Models
Synchronous Speech Translation
DeepL is excited about the next frontier in translation: spoken language and voice [00:39:14]. While AI-based text translation has significantly changed how content is consumed, real-time conversational translation is not yet seamless [00:39:40]. Achieving synchronous speech translation would transform business operations, allowing globally distributed teams to communicate in their native languages in real-time, bridging language gaps for education, learning resources, and knowledge sharing [00:40:50].
Key technical challenges to overcome for synchronous speech translation include:
- Latency: Ensuring real-time processing [00:46:03].
- Ambiguity and Unstructured Language: Spoken language is often casual, unstructured, and filled with ambiguities, unlike written text. Models need to be taught to translate different types of spoken language [00:46:10].
Impact on Language Learning
The availability of advanced AI translation tools raises questions about the future of human language learning [00:41:47]. While AI might reduce the business requirement to learn multiple languages, potentially leading to fewer people learning them for practical purposes, Kutylowski believes personal interest and cultural connection will still drive language acquisition [00:43:40]. Learning languages remains valuable for brain development and provides an enjoyable intellectual challenge, similar to playing chess despite AI’s superiority in the game [00:43:56].
Lessons from DeepL’s Journey
Beating Tech Giants
DeepL attributes its success against giants like Google Translate to intense focus on the market, continuous innovation, and building strong academic-level research within the company while specializing in high-value business translation [00:08:13]. Being established in Europe, a hub of diverse languages, fostered a deep understanding and motivation within the team [00:08:46].
Beyond Technology Alone
Early on, Kutylowski believed that strong technology alone would suffice [00:49:43]. However, he learned that to deploy technology effectively, especially in AI, a broader approach is necessary, encompassing product development, commercialization, and understanding the “big picture” [00:49:56].
Evaluation of Models
Evaluating AI translation models involves both synthetic metrics (like BLEU score) and human evaluation [00:34:05]. While synthetic metrics are useful for rough orientation during training, they quickly become insufficient as quality improves [00:34:25]. The “real test” is human evaluation, where thousands of translators judge translations for accuracy, nuance, and native feel, often in a comparative way against other models [00:34:55]. This is particularly relevant for business translations, which prioritize accuracy and fluency over subjective literary beauty [00:33:19].
The challenges of building AI infrastructure companies are evident in DeepL’s journey, highlighting the need for specialized solutions, robust data pipelines, and a flexible infrastructure strategy to manage evolving AI capabilities and market demands.