Limitations of current AI models and future architecture

From: redpointai

Aidan Gomez discusses the current limitations of AI models and the anticipated architectural and developmental shifts in the field, emphasizing the need for smarter, more adaptable AI systems [00:09:12].

Current Limitations of AI Models

Current AI models, despite their advancements, still face significant challenges:

Frozen Nature Models are typically trained once, resulting in a final, frozen version of weights that does not learn from new interactions [00:08:33]. This means they do not remember past conversations or adapt based on user feedback [00:08:43].
Lack of Learning from Experience Unlike humans who improve with practice, existing models lack the ability to learn from their experiences in the world or from direct feedback from users [00:08:48]. This means if a model makes a mistake or needs guidance, that feedback is often forgotten in a new interaction [00:45:08].
Diminishing Returns on Scale The hypothesis that “scale is all you need” is breaking down, with current methods showing “very heavy diminishing returns of capital and compute” [00:09:21], [00:09:26]. Future advancements will require more creative and smarter approaches rather than simply increasing computational resources [00:09:35].
Privacy and Integration Challenges for Agents AI agents require extensive access to sensitive enterprise data (emails, chat, CRM, ERP, HR software) to be effective, which raises significant privacy concerns unique to AI compared to other enterprise software [00:02:18], [00:02:24]. Additionally, each company uses a unique “tapestry” of software, necessitating custom setup and integration [00:02:43].

Future Architecture and Model Improvements

Future AI development aims to address these limitations through several key areas:

Learning from Experience The most significant desired capability is for models to learn from user interactions and feedback, allowing them to become “experts” over time, similar to human learning [00:08:58], [00:44:46]. This could involve models accessing a database of previous interactions and user preferences to inform future responses [00:45:47].
Reasoning Capabilities The introduction of reasoning models has been a “complete step change in terms of improvement,” enabling AI to solve problems that were previously “impossible” by reflecting on failures and finding alternative paths [00:16:17], [00:16:47]. This allows models to spend different amounts of energy on problems of varying complexity [00:08:00].
Beyond Transformers Despite the current dominance of the Transformer architecture, there is a strong hope and expectation for new architectures to emerge in the next 5-10 years [00:36:50]. While State Space Models (SSMs) and discrete diffusion models have shown promise, their superiority over Transformers for general language modeling is not yet clear [00:35:58].
Smarter and More Creative Approaches The field must move beyond simply scaling up compute and capital, fostering greater innovation and creativity to achieve the next leap in AI technology [00:09:35].
Shift to Test-Time Compute While still requiring significant compute, the focus is shifting to test-time compute, which is crucial for enabling models to reason through problems in business applications by interacting with various tools and data sources [00:16:00], [00:39:01].

Specialized vs. General Models

The debate between a single “one model to rule them all” and a world of specialized models is evolving:

Self-Developing Experts Models are increasingly able to self-develop “experts” or sub-networks within themselves, which alleviates some pressure for entirely specialized models [00:10:46].
Importance of Custom Models Custom models remain vital for incorporating fundamental context from specific businesses or domains not available on the public web, such as manufacturing data, customer transactions, or detailed personal health records [00:10:57]. Companies like Coher partner with organizations possessing this private data to build specialized models [00:11:45].
Role of Synthetic Data Synthetic data is significantly closing the gap between general and specialized models [00:11:59]. While human data is still necessary, especially for evaluation, synthetic data forms an overwhelming majority of what Coher generates for new models [00:15:15]. This approach allows using smaller pools of human experts to generate much larger synthetic datasets [00:14:26].

Role of Data Labeling and Eval

Human input remains critical, particularly in evaluation:

Eval as a Human-Dependent Process For building models for people, humans are the best evaluators of usefulness [00:13:09]. Evaluation (Eval) is where humans currently cannot be taken out of the loop [00:13:15].
Synthetic Data Generation for Training While direct human data generation (e.g., 100,000 doctors teaching a model medicine) is too expensive and not viable, synthetic data generation, enabled by models’ ability to “chitchat,” allows using smaller, trustworthy human datasets to create vast synthetic lookalike data [00:13:55]. This process is easier in verifiable domains like code and math where results can be checked and filtered [00:14:49].

Future Trends in AI and Multimodal Models

The future holds promise for new generations of AI models beyond current large language models:

Domain-Specific Foundation Models New foundation models are expected to emerge for specific scientific and industrial domains, such as biology, chemistry, and material science [00:32:15].
Addressing Data Silos In fields like cancer research, vast amounts of data exist but are siloed and inaccessible [00:34:43]. A global effort similar to the GPT moment could unlock incredible advancements if capital is applied to this data [00:32:56]. This is primarily a human problem of sharing, not a data generation problem [00:34:56].
Hardware Evolution Hardware needs are evolving, with test-time compute still requiring significant resources, but the overall trend points to cheaper and more abundant compute [00:39:06]. The ability to combine different types of chips to build effective supercomputers is a positive development [00:39:46].

Tubegraph

Explorer

Table of Contents