Hardware and compute scalability challenges in AI

From: redpointai

The realm of AI is rapidly evolving, bringing with it significant challenges and opportunities related to hardware and compute scalability. Experts are continually assessing what surprises them most, what’s overhyped or underhyped, and the biggest unanswered questions in the field [01:08:10].

Shifting Scaling Laws and Compute Demands

Initially, pre-training was a key focus, but its scaling limits became apparent. The release of advanced models, particularly right after a talk on “scaling is dead,” highlighted a swift transition in focus [01:27:07]. This shift suggests that inference time is now seen as the new scaling law [02:10:04].

There’s a prevailing challenge in scaling AI models and test time compute due to the “rule of nines” at OpenAI: to increase reliability from 90% to 99%, or from 99% to 99.9%, requires an order of magnitude increase in compute, occurring every 2 to 3 years [01:51:36]. This exponential demand raises significant questions about future hardware and compute availability [01:51:54].

GPU Availability and Vendor Dominance

A major concern within AI infrastructure is the availability of GPUs [01:42:55]. Enterprises, accustomed to running everything privately in VPCs, now struggle to secure enough GPUs for their needs, pushing them towards multi-tenant architectures [01:43:41].

The dominance of Nvidia in the GPU market is a significant factor [01:52:00]. While Nvidia experienced a 15% drop in stock value due to perceptions around DeepSeek’s open-sourcing [01:59:01], the company’s strong ecosystem around CUDA has allowed it to maintain its leading position [01:53:15]. Despite competition from other chip developers like AWS (Trainium and Inferentia) [01:52:03], AMD [01:53:13], Microsoft [01:53:20], and Facebook [01:53:21], no one has yet made a substantial dent in Nvidia’s market share [01:53:32].

Dedicated Silicon and Workload Stability

The general-purpose nature of GPUs, which supports gaming, crypto, and AI, underpins their broad utility [01:52:56]. The stability of the transformer architecture suggests a strong case for developing ASICs (Application-Specific Integrated Circuits) tailored for transformers, which could offer greater efficiency [01:54:08]. New companies entering this space typically need to have started after 2019 or 2020, as earlier ventures might have been too general-purpose before transformers became dominant [01:54:25].

The Role of Private Cloud Compute (PCC)

Apple’s Private Cloud Compute (PCC) is an underhyped area that could be very significant [01:44:03]. It aims to bring on-device security to the cloud through architecturally interesting methods [01:45:51]. While many AI workloads will remain on-device, larger LLMs will still require cloud interaction [01:46:09]. This approach addresses the need for “single-tenant guarantees in multi-tenant environments” [01:47:01].

The Infrastructure Space: Beyond Bare Metal

The AI infrastructure development space is broad, with focus extending beyond bare metal to the “LLM OS” – the infrastructure around models [01:42:55]. Key areas of interest include:

Code execution [01:43:12]
Memory (stateful AI) [01:43:16]
Search [01:43:21]
Security (email, identity, binary inspection) [01:43:37]: As AI is used for offense, AI must be applied for defense [01:43:44]. The ability of models to infer semantics from code, beyond just syntax, is promising [01:44:09].

However, some areas of AI infrastructure companies are viewed with less bullishness due to their capital-intensive nature or market challenges:

Serving models (GPU clouds) [01:45:51]: While great people can make money, it’s a very capital-intensive business [01:45:56].
Finetuning companies [01:47:56]: It’s hard to see them as a standalone “big thing” and typically need to be part of a broader enterprise AI company or service offering [01:48:00].
AI DevOps/AIOps [01:48:20]: While there’s potential, particularly for anomaly detection and improving Mean Time to Resolution (MTTR), the technology isn’t fully mature yet for autonomous operations [01:48:54].
Voice real-time infra [01:48:37]: Though hot and interesting, its market size remains a question [01:48:40].

Ultimately, the application layer is seen as significantly more interesting than infrastructure due to its ability to charge for utility rather than simply reducing to a “cost-plus” model [01:47:17].

Tubegraph

Explorer

Table of Contents