From: redpointai
Aiden, a co-author of the Transformer paper, expressed his hope that transformers are not the final architecture in AI [00:00:10]. He has anticipated a shift away from transformers for some time [00:35:44].
Current State of Architectures
Despite expectations for new architectures, the longevity of transformers has been surprising [00:37:17]. Aiden stated that if asked in 2018 (a year after the Transformer paper was published) about the likelihood of still using transformers seven years later, he would have put it “pretty close to zero” [00:36:57].
Emerging Alternatives and Hybrid Approaches
While new architectures are emerging, they do not always fully replace existing ones:
- SSMs (State Space Models): Aiden initially believed SSMs would replace transformers, even naming a meeting room after them [00:35:58]. However, it was found that beneficial components of an SSM could be integrated into a transformer, negating the immediate need for a full swap [00:36:06].
- Discrete Diffusion Models: These models offer a “super cool UX” where responses emerge from a “wall of noisy tokens and text” [00:36:13]. However, it is not yet clear if they are inherently better language models than transformers [00:36:31].
The Need for New Architectures
Aiden expressed a strong desire for new architectures to emerge within the next 5-10 years [00:36:46]. He noted that the “scale is all you need” hypothesis is “breaking,” with diminishing returns on capital and compute [00:09:21]. This indicates that new approaches will be necessary to achieve the next step in technological advancement [00:09:35].