Mistral AI and its competition with OpenAI

From: redpointai

Mistral AI, co-founded by Arthur Mensch, has quickly become a central player in the AI landscape, particularly noted for building leading open-source Large Language Models (LLMs) [00:00:08]. The company aims to define the future of AI policy and product development [00:00:20].

Company Identity and Approach

The name “Mistral” is officially linked to the French pronunciation of “AI” (intelligence artificielle), incorporating the “I” and “A” vowels [00:01:08]. The unofficial story is that the founders struggled to find a name and eventually settled on Mistral, a French wind [00:01:22]. This “Wind of Change” concept resonates with the company’s ethos [00:01:41].

Mistral gained significant attention for its unconventional distribution methods, notably releasing its models via torrent, a decision influenced by the earlier open-source release of LLaMA [00:01:52], [00:02:00]. The company’s distinctive “Word Art” logo was also born out of a rapid decision when their initial Twitter account was stolen [00:02:26], [00:02:46]. These spontaneous choices have contributed to Mistral’s brand identity and developer appeal [00:03:03].

Market Landscape and Competitive Strategy

The current AI market is seen as solidifying into two main camps: closed-source offerings from companies like OpenAI, Anthropic, and Google, and open-source initiatives from Meta, Grok, and Mistral [00:03:11]. Mistral firmly believes that open source will prevail because AI is fundamentally an infrastructure technology that should be modifiable and owned by customers [00:03:31].

Mistral’s strategy involves:

Bridging the Gap: Actively working to close performance and usability gaps with closed-source offerings, which historically had better software and APIs [00:04:04], [00:04:10].
Hybrid Offering: Maintaining both open-source and commercial products to sustain open-source development [00:03:45]. The very best models are typically commercial, while strong, slightly less advanced models are open-source, though this is a flexible, tactical approach due to market pressures [00:05:05].
Developer Focus: The mission is to be the most relevant platform for developers [00:05:29].

Product Offerings and Enterprise Adoption

Mistral differentiates its offerings by providing access to model weights for its commercial solutions, such as Mistral Large [00:04:51]. This approach allows enterprises to:

Deploy Models On-Premise: Customers can deploy models where their data resides, addressing data governance concerns [00:06:00].
Enable Specialization: Enterprises can fine-tune and specialize models for their specific needs, connecting them to internal systems and building more complex applications than simple API usage [00:06:11], [00:06:21].

Mistral’s core strength lies in training and specializing models [00:06:59]. While they are building their own inference pipeline, they also leverage partnerships [00:07:27].

Partnership Strategy

Mistral has formed significant partnerships with major players like Microsoft, Snowflake, Databricks, and Nvidia [00:08:00]. This strategy focuses on:

Distribution Optimization: Partnering with hyperscalers (e.g., Azure) and data cloud providers (e.g., Snowflake, Databricks) facilitates adoption by meeting enterprises where their data and developers already operate [00:08:08], [00:08:45], [00:08:52].
Procurement Streamlining: For larger enterprises, particularly in Europe, accessing Mistral’s technology through existing cloud credits simplifies the procurement process [00:09:51].
Multi-platform Presence: Aims to be a multi-platform solution, replicating its offerings across different environments [00:08:38].

This approach contrasts with companies like OpenAI, which also offers direct sales alongside platform access via Azure [00:09:18]. Smaller, digital-native companies often engage directly with Mistral, receiving direct support, while larger enterprises prefer indirect channels through established partners [00:09:36].

Future of LLMs and Technical Focus

Arthur Mensch outlines several future development areas for LLMs:

Efficiency Frontier: Continued push for greater efficiency in models, building on successes like Mistral 7B [00:10:25].
Model Controllability: Significant research is still needed to make models more controllable and follow instructions precisely [00:10:44].
Architectural Improvements: While Transformers are dominant, more efficient architectures are possible, though challenging to develop due to the co-adaptation of training algorithms, debugging, and hardware to Transformers over seven years [00:11:09], [00:14:42]. Mistral has focused on improvements like sparse attention for memory efficiency [00:15:37].
Deployment and Latency: Deploying models on smaller devices and improving latency will unlock many new applications that treat LLMs as a basic building block for planning and exploration tasks [00:11:27], [00:11:34].

Regarding compute and GPUs, Mistral operates leanly but efficiently, achieving good results with 1.5K H100 GPUs, and plans to increase this capacity [00:12:01], [00:12:22]. They prioritize efficient use of training compute to ensure a valid business model, contrasting with larger players like Meta that command hundreds of thousands of GPUs [00:13:02]. Nvidia’s continuous improvement in dollar-per-flops is seen as beneficial for training larger models [00:16:11].

Mistral aims to be the “best model provider” by understanding compute needs, potentially using less compute than competitors to stay relevant [00:13:30]. The Chinchilla work and Mistral 7B have shown significant gains in model efficiency, and further progress is expected [00:13:56]. When asked about Grok’s large model, Mensch suggested it could be smaller and more efficient, emphasizing the importance of the Pareto front between model size and performance [00:31:36].

AI Policy and Regulation

Mistral’s stance on AI policy, particularly regarding the EU AI Act, is that AI safety should be approached from a product safety perspective, similar to how software safety is addressed [00:17:11]. They argue that:

Focus on Applications: Regulation should focus on the product and its expected behavior, rather than the underlying technology or arbitrary “flop thresholds” [00:17:25], [00:17:40].
Ineffective Regulation: Direct technology regulation on LLMs (as a “coding language”) is an “ill-directed burden” that doesn’t solve the core product safety problem [00:18:04], [00:18:27]. While manageable (Mistral performs evaluations anyway), it doesn’t ensure product safety [00:18:00].
Rethinking Evaluation: The stochastic nature of AI models requires a rethinking of evaluation, continuous integration, and verification processes [00:19:31]. This is primarily a “technological problem” and a “product problem” for companies to provide tools, rather than a regulatory one [00:19:47].
Transparency vs. Trade Secrets: While open to transparency of training datasets, they advocate for caveats to protect competitive trade secrets [00:18:45].
Healthy Pressure: Regulating application makers would create “healthy competitive pressure” on model makers to build more controllable models [00:22:21], [00:22:31]. Conversely, direct technology regulation favors larger players who can better navigate regulatory complexities [00:22:43].

Geographical and Language Specialization

Mistral advocates for portability as an approach to sovereignty [00:23:25]. Instead of every country needing its own LLM company, the focus should be on enabling developers to deploy technology where they choose [00:23:17].

Multilingualism: Acknowledging that current models are much better in English, Mistral is committed to making models excellent in every language, starting with French [00:23:43]. This focus on language capability is seen as integral to the pre-training process and thus belongs to foundational model companies [00:24:41].
Global Company: Mistral’s approach is to be a global company that is portable and multilingual, ensuring its technology is ubiquitous [00:24:08].
Sovereignty Solution: Providing access to modifiable technology (e.g., through weight distribution) addresses national sovereignty concerns better than a pure Software-as-a-Service (SaaS) model [00:25:13].

Mistral’s Genesis and Growth

The co-founders’ confidence in starting Mistral, despite initial skepticism about entering a market with established players like OpenAI and Anthropic, stemmed from their prior experience, ability to attract talent, the presence of a strong talent pool in Paris, increased VC awareness from ChatGPT, and confidence in shipping good models quickly [00:26:08]. Mistral has experienced rapid attention, which has been a positive challenge [00:31:18].

Application Layer and Data Strategy

Mistral also engages with the application layer to help enterprises get started with generative AI [00:27:12]. Products like “LCHAT” (possibly “Entrerise,” although hard to pronounce according to the hosts) provide contextualized assistants [00:27:30], [00:27:34]. This also helps solidify APIs and provide feedback for the developer platform [00:27:45].

Regarding data for fine-tuning models, Arthur Mensch notes:

Large datasets are best utilized with Retrieval Augmented Generation (RAG) and empowering assistants with tools and database access [00:29:13].
Fine-tuning is more effective with “demonstration data” – traces of user interactions that the model can imitate [00:29:31].
Many enterprises lack this specific type of “brand new kind of data” readily available, creating a more even playing field for companies to acquire it [00:29:44], [00:29:54]. Enterprises need to rethink their data strategy in light of deploying copilots and assistants [00:30:11].

Quick Takes

Overhyped in AI: Synthetic data, due to its undefined nature [00:30:27].
Underhyped in AI: Optimization techniques [00:30:37].
Biggest surprise in building Mistral: Gaining attention more quickly than anticipated [00:31:16].
What thought would work and didn’t: Hiring challenges for the US science team [00:31:01].
Thoughts on Grok model: “A little too big”; can be smaller and more efficient while maintaining performance [00:31:33].
Excited about other AI startups: Dust, a Paris-based startup focusing on Knowledge Management with a sleek UI [00:32:01].
Alternative AI application to build: If not building models, accelerating Material Science, such as optimizing ammonia synthesis, which currently lacks a foundational model [00:32:36]. This highlights a broader trend of foundational models for specialized industries like biology, material science, and robotics [00:38:40].

To learn more about Mistral AI, their documentation, guides, and APIs are good starting points [00:33:31].

Tubegraph

Explorer

Table of Contents