The future of AI architectures beyond transformers

From: redpointai

Aiden, a co-author of the Transformer paper, expressed his hope that transformers are not the final architecture in AI [00:00:10]. He has anticipated a shift away from transformers for some time [00:35:44].

Current State of Architectures

Despite expectations for new architectures, the longevity of transformers has been surprising [00:37:17]. Aiden stated that if asked in 2018 (a year after the Transformer paper was published) about the likelihood of still using transformers seven years later, he would have put it “pretty close to zero” [00:36:57].

Emerging Alternatives and Hybrid Approaches

While new architectures are emerging, they do not always fully replace existing ones:

SSMs (State Space Models): Aiden initially believed SSMs would replace transformers, even naming a meeting room after them [00:35:58]. However, it was found that beneficial components of an SSM could be integrated into a transformer, negating the immediate need for a full swap [00:36:06].
Discrete Diffusion Models: These models offer a “super cool UX” where responses emerge from a “wall of noisy tokens and text” [00:36:13]. However, it is not yet clear if they are inherently better language models than transformers [00:36:31].

The Need for New Architectures

Aiden expressed a strong desire for new architectures to emerge within the next 5-10 years [00:36:46]. He noted that the “scale is all you need” hypothesis is “breaking,” with diminishing returns on capital and compute [00:09:21]. This indicates that new approaches will be necessary to achieve the next step in technological advancement [00:09:35].

Tubegraph

Explorer

Table of Contents

The future of AI architectures beyond transformers

Current State of Architectures

Emerging Alternatives and Hybrid Approaches

The Need for New Architectures

Graph View

Backlinks