From: allin

The release of DeepSeek’s R1 language model has highlighted the accelerating pace of AI advancements in China, creating significant discussion regarding global competition and innovation [00:15:40].

The DeepSeek R1 Model

DeepSeek, a Chinese AI startup, released its R1 language model, which is considered on par with some of the best models produced in the West, such as OpenAI’s 01 model [00:15:47]. This release was surprising to many, as it suggested that China was closer to the AI frontier than previously thought, potentially shifting the perceived gap from six to twelve months down to three to six months [00:21:36].

Cost Claims and Compute Resources

DeepSeek claimed to have trained R1 for just 800 million expenditure for GPT-4 and projected 6 million claim is largely debunked by experts, who argue that this figure likely only refers to the final training run [00:22:04]. The fully loaded cost, including R&D and the substantial compute cluster, is estimated to be over a billion dollars [00:25:25]. DeepSeek is believed to possess a cluster of about 50,000 Hoppers, including 10,000 H100s, 10,000 H800s, and 30,000 H20s, which were likely acquired before export controls were fully implemented [00:24:31].

Distillation and Open Source

There is strong evidence, including self-identification by DeepSeek’s V3 model as ChatGPT, suggesting that DeepSeek’s models have undergone “distillation,” meaning they were trained using output from larger models like OpenAI’s GPT-4 [00:35:08]. This process involves a smaller model learning from the responses of a larger, more complex model [00:31:22]. While the exact method of acquisition (e.g., web crawling public output or API access) is debated, OpenAI itself has indicated improper distillation [00:36:08].

A key aspect of DeepSeek’s R1 release is its open-source nature, offering API access at a significantly lower cost compared to Western counterparts [00:21:21]. This move challenges the closed-source approach taken by many leading AI companies in the West, and some view it as a realization of the original open-source mission that OpenAI was initially founded upon [00:40:04].

China’s Innovation Under Constraint

China’s approach to AI development demonstrates “necessity as the mother of invention” [00:26:51]. Despite export restrictions on advanced GPUs, Chinese firms are finding innovative ways to develop competitive models. This includes:

  • Novel Algorithms: DeepSeek developed a new reinforcement learning algorithm (GRPO) that uses less computer memory and is highly performant, differing from the previously dominant PPO algorithm used in the West [00:27:17].
  • Circumventing Proprietary Languages: Chinese developers have worked around Nvidia’s proprietary CUDA language, using PTX to directly access the bare metal of the chips, demonstrating a high level of technical ingenuity [00:28:01].
  • Rapid Copying and Innovation: Based on observations from Uber’s operations in China, Chinese companies have an “epic” ability to rapidly copy and then innovate upon existing technologies, leading to new solutions not seen elsewhere [00:51:25]. An example is the widespread use of smart lockers in Chinese office buildings for food and package delivery, which optimizes courier efficiency [00:53:56].

Geopolitical and Economic Implications

The competition in AI is a significant aspect of the broader US-China rivalry [00:18:09].

Export Controls and Self-Reliance

The US has implemented export restrictions on Nvidia H100 GPUs to China [00:22:21], aiming to curb China’s AI progress. However, there are concerns that these controls may be futile, with chips potentially being rerouted through places like Singapore, where a significant portion of Nvidia’s revenue is directed [00:57:05]. Furthermore, cutting off access could force China to develop its own chip manufacturing capabilities and design chips that circumvent the most complex Western technologies, leveraging AI to design simpler, yet effective, chips [00:59:21].

Commoditization and Value Shift in AI

The rapid progress and open-source nature of models like R1 suggest that AI models themselves could become commoditized much faster than anticipated [00:29:21]. This shifts the point of value creation in the AI value chain away from base models to the application layer, similar to how YouTube was built on storage or Uber on GPS [00:43:42]. The argument is that the fastest depreciating asset in the world is a large language model [00:44:04].

The Role of Government and Capitalism

In China, a central authority might bear the capital expenditure for developing advanced AI models, which can then be freely used by Chinese companies, effectively operating with a “golden vote” and a seat on their boards [01:01:13]. This contrasts with the Western model, where private companies seek venture capital, potentially leading to overcapitalization and bureaucracy, which could hinder agile innovation compared to constraint-driven development [01:04:48].

Future of AI and Economic Impact

The decreasing cost of AI (cheap AI) is expected to significantly increase its usage across various applications and industries, similar to Jevons’ Paradox, where the efficiency of resource use leads to increased demand [00:45:23]. This will lead to more specialized AI models for specific tasks (e.g., investor AI, autonomous car AI) [00:46:06]. The true competitive advantage in AI may not lie in owning the largest data center networks, but in proprietary data and content, and the ability to leverage this data to build continuously improving products [01:09:07].