Model architectures in AI

From: redpointai

Jonathan Frankle, Chief AI Scientist at Databricks, discusses the landscape of AI model architectures, their evolution, and the ongoing debate about future directions. His insights are shaped by extensive experience, including his tenure at Mosaic AI before its acquisition by Databricks, and observations from working with 12,000 Databricks customers [00:00:00].

The Dominance of Transformers

Frankle notes his “famous bet” on Transformers remaining the dominant architecture in AI for a specified period [02:43:00]. He maintains confidence in this bet, emphasizing a long-term view rather than day-to-day fluctuations [02:59:00].

Current models extensively used today are fundamentally the Vaswani et al. Transformer, with minor variations like different positional encodings and focusing on the decoder [03:17:00]. Frankle suggests that a “sweet spot” in the hyperparameter space was discovered, and there hasn’t been a compelling reason to significantly alter the foundational architecture [03:25:00].

Historical Context: LSTMs

Before the Transformer, the state-of-the-art architecture in Natural Language Processing (NLP) was recurrent neural networks, specifically LSTMs (Long Short-Term Memory networks) [03:37:00]. LSTMs predate Frankle by a year [03:50:00]. He poses the question of whether Transformers are inherently superior to LSTMs or if their current dominance is simply due to where collective research and development efforts were concentrated [04:11:00].

Architectural Simplicity Over Time

Frankle observes a trend where successful architectures tend to become simpler, not more complicated, over time [04:21:00]. He highlights that Transformers are considerably simpler than LSTMs [04:24:00]. The field of science, he notes, progresses in significant leaps followed by periods of consolidation, rather than a continuous, linear scaling [04:42:00]. Given this historical pattern, the notion of something suddenly eclipsing the Transformer seems “ahistorical” [04:52:00].

Future Outlook on Architectures

Despite new developments like Open AI’s O1 model or Anthropic’s work on computer use, Frankle remains cautiously optimistic about the enduring relevance of current architectural paradigms, particularly Transformers, for the foreseeable future [04:58:00], [42:13:00]. He emphasizes that even if technology were to “freeze” at the current state of GPT-4, the industry would still witness immense creativity and innovation simply from learning how to better utilize and apply these tools [25:21:00].

Frankle acknowledges that achieving “perfection” or “fifth nine” reliability with the current generation of technology will be challenging, indicating inherent fuzziness in these systems [24:15:00]. However, he remains bullish on compound AI systems and agents that creatively chain models together to improve quality, viewing it as pushing further along the cost-quality curve [23:31:00].

While he views new research like O1 as “exciting,” he cautions that it’s too early to definitively label it a “breakthrough,” as historical hindsight often redefines such terms [42:13:00]. The ability of companies like OpenAI to scale ideas is noted as a significant engineering achievement [43:15:00]. Similarly, Anthropic’s exploration into computer use and new product forms is praised for its creativity and willingness to take risks, despite not all attempts being immediate hits [46:01:00]. Frankle values experimentation and the “garden of ideas” currently blooming in the product space [46:18:00].

Frankle acknowledges that massive investments in compute (e.g., $50B+) for new models are experiments whose return on investment won’t be clear until the results are seen [01:00:00].

Tubegraph

Explorer

Table of Contents

Model architectures in AI

The Dominance of Transformers

Historical Context: LSTMs

Architectural Simplicity Over Time

Future Outlook on Architectures

Graph View

Backlinks