The economics and resource costs of AI model scaling

From: redpointai

The journey towards advanced AI capabilities, including Artificial General Intelligence (AGI), is profoundly influenced by the economics and resource costs associated with scaling AI models [00:01:18]. Historically, significant advancements have been directly tied to an increase in computational resources and data [00:02:02].

Evolution of Training Costs

The cost of training frontier AI models has dramatically increased over time:

GPT-2: Approximately $5, 000 -$ 50,000 [00:01:52].
GPT-4: Costs escalated from “thousands to tens of thousands of dollars to hundreds of thousands to millions to tens of millions” [00:02:09]. Some labs may even be spending “hundreds of millions of dollars today” on these models [00:02:12].

This indicates that simply throwing more money, resources, and data into pre-training continues to yield better models [00:02:22].

The “Soft Wall” of Cost

While increasing resources improves models, there is an economic limit to this approach [00:02:47]. Scaling models by 10x in capabilities could translate to costs of billions or even tens of billions of dollars [00:02:41]. At some point, it becomes “no longer economically worth it to push that further” [00:02:49]. This represents a “soft wall” rather than a hard technical limitation [00:02:59].

The Promise of Test-Time Compute

A significant shift in focus has been towards test-time compute (or inference compute) as a more cost-effective path to enhance model capabilities [00:03:08].

Cost Efficiency: While pre-training scaling becomes increasingly difficult, test-time compute is still in early stages, offering “a lot of room” for algorithmic improvements and scaling [00:03:37].
Analogy to GPT-2 Era: The current state of test-time compute is compared to the early days of GPT-2, where it was “pretty obvious” that scaling pre-training by 1,000x would lead to a better model [00:03:20]. A similar opportunity exists for test-time compute today [00:03:37].

“I thought it would take at least a decade [to scale inference compute generally] it took like 2 or 3 years.” [00:08:09]

Potential for Extreme Scaling

Considering the dollar value, a typical ChatGPT query costs about a penny [00:04:37]. However, for critical problems, people might be willing to pay significantly more, potentially “a million dollars for some of the most important problems that Society cares about” [00:04:55]. This implies “eight orders of magnitude” of room to push test-time compute further, not just by spending more, but through algorithmic improvements [00:05:01].

Organizational and Development Challenges

OpenAI, despite pioneering large-scale pre-training, embraced test-time compute research [00:09:39]. Initially, their motivation for this direction was different—focused on “overcoming the data wall” rather than explicitly scaling test-time compute [00:10:07]. However, the techniques and agendas proved compatible [00:10:22].

The company’s willingness to invest heavily in a “risky Direction” like O1, which was “disruptive to the Paradigm that OpenAI pioneered,” demonstrated organizational excellence and adaptability, avoiding the “innovator’s dilemma” [00:11:44].

Impact on Hardware

The shift towards emphasizing inference compute will likely drive significant changes in hardware development [00:35:20]. Traditionally, the focus was on massive pre-training runs with assumptions of cheap inference costs. This new paradigm creates “an opportunity for a lot of creativity on the hardware side to adapt” [00:35:29].

The “Bitter Lesson” and its Economic Implications

The “Bitter Lesson” from Richard Sutton’s essay argues that methods that scale well with more compute and data ultimately outperform approaches that encode human knowledge or rely on complex scaffolding [00:26:02].

Scaffolding vs. Scaling: Adding “scaffolding and prompting tricks” to models to push their capabilities slightly further is tempting but may not scale well with more data and compute [00:26:48].
Long-Term Impact: Techniques like O1 that inherently scale well with data and compute are expected to become dominant in the long run, making many current scaffolding techniques obsolete [00:27:15].
Dilemma for Developers: Builders, especially startups, face a choice: solve immediate problems with scaffolding, or invest in solutions that align with future scaling trends. Investing heavily in scaffolding for capabilities that soon become “out of the box” in more capable models can be a wasted effort [00:27:37].

Cost Efficiency and Accessibility for Research

The increasing dependence on data and compute resources poses a significant challenge for academic AI research [00:29:07]. Academia often incentivizes short-term gains like minor performance improvements on evaluations through clever prompting, which may not translate to impactful long-term research [00:29:50].

Instead, academics are encouraged to:

Investigate “novel architectures” or approaches that demonstrate promising scaling trends with more data and compute, even if they don’t immediately achieve state-of-the-art performance [00:30:21].
Utilize AI models to run scalable and cheaper experiments in fields like social sciences [00:36:24]. For example, AI models can simulate human behavior in economic game theory experiments, providing insights at a fraction of the cost and without ethical concerns of human subjects [00:37:07].

Tubegraph

Explorer

Table of Contents