Scaling paradigms in AI research

From: aidotengineer

Karina, an AI researcher at OpenAI (previously at Anthropic), highlights significant scaling paradigms in AI research over the past two to four years and their impact on product development, sharing lessons learned from products like Claude and ChatGPT [00:00:17]. She also discusses the future of AI agents evolving from collaborators to co-innovators [00:01:00].

Two Core Scaling Paradigms

1. Next Token Prediction (Pre-training)

This paradigm, prominent from 2020 to 2021, views the model as a “World building machine” [00:01:48]. The model learns to understand the world by predicting the next word or token [00:01:51]. To predict what comes next, the model must understand how the world operates [00:02:27]. Tokens can be strings, words, or pixels [00:02:14].

Pre-training involves massive multitask learning, where some tasks are easier to learn, such as translation or factual knowledge like “the capital of France is Paris” [00:02:39]. However, other tasks are significantly harder and require substantial compute scaling during the training stage [00:03:11]. These include:

Physics and problem-solving [00:03:27]
Logical expressions and spatial reasoning [00:03:32]
Complex math, which often necessitates Chain of Thought to aid in computation [00:03:43]
Creative writing, particularly World building, storytelling, and maintaining plot coherence, where mistakes can easily deteriorate the narrative [00:04:08]. Measuring “good” creative writing is also a significant open-ended research problem [00:04:46].

2. Scaling Reinforcement Learning on Chain of Thought

Introduced by OpenAI with the model “01” last year, this paradigm focuses on highly complex reasoning [00:06:51]. It involves spending more test time compute on training reinforcement learning [00:07:15]. The model learns to think during training and improves through feedback by having strong signals in reinforcement learning [00:07:27].

To tackle increasingly difficult tasks, such as solving medical problems, models need to spend significant time reasoning through the problem [00:07:53]. This requires creating more complex environments and utilizing tools to think through and verify outputs during the Chain of Thought process [00:08:11]. Challenges remain in measuring the faithfulness of the Chain of Thought and enabling models to backtrack from wrong directions [00:08:35].

Product Development and Design Challenges

These scaling paradigms have unlocked new avenues for product research [00:11:05]. High-reasoning models enable a rapid evaluation cycle for product development by:

Distilling knowledge back to smaller, faster-iterating models [00:11:33].
Synthetically generating new data for post-training and reinforcement learning environments [00:11:43].

This allows for the creation of entirely new classes of tasks, such as simulating different users for multiplayer collaboration [00:12:00]. Building scalable AI systems also means moving towards more complex reinforcement learning environments where models can use tools like search, browsing, or collaborative canvases to improve their collaborative abilities [00:12:39].

Models are becoming extremely good at in-context learning, capable of learning new tools from a few-shot examples, which greatly accelerates the development cycle [00:13:00].

Design challenges for AI agents include:

Bringing unfamiliar capabilities into familiar form factors, such as making 100K context successful through file uploads [00:13:39].
Enabling modular compositions in product features to scale with future model capabilities, as seen with ChatGPT Tasks which extends beyond reminders to continuous story generation or personalized daily searches [00:15:19].
Bridging real-time interaction with asynchronous task completion, where models might research or write code for extended periods [00:15:42]. Building trust in these asynchronous tasks can be achieved by giving humans new collaborative affordances to verify and edit model outputs, and provide real-time feedback for self-improvement [00:16:00].

Case Studies and Future Vision

GitHub Copilot

This product demonstrated the power of pre-training in understanding code and predicting the next token [00:05:43]. It was made more useful through post-training using Reinforcement Learning from Human Feedback (RLHF) and Reinforcement Learning from AI Feedback (RLAF) [00:06:06].

Claude (Anthropic)

Claude on Slack was an early attempt at a virtual teammate, leveraging Slack’s tools and multiplayer collaboration features [00:16:21]. It could summarize channels across an organization, a concept that inspired future projects like ChatGPT Tasks [00:16:53].

Canvas (OpenAI)

Karina’s first project at OpenAI, Canvas, focused on human-AI collaboration and creative capabilities [00:17:14]. It offers a flexible interface for:

Co-creation and fine-grain editing [00:17:40].
Model search to generate reports and verify outputs [00:17:47].
Multi-agentic and multiplayer collaboration, allowing critics or editors to join [00:17:57].
Becoming a pair programmer and data scientist, capable of analyzing CSV documents in real-time [00:19:34].
Co-creating new knowledge and research artifacts, enabling human-AI teams to verify research directions or reproduce open-source repositories [00:20:09].

Future of AI Agents: From Collaborators to Co-Innovators

OpenAI considers the current year (2024) the “year of agents,” characterized by highly complex reasoning models using real-world tools like browsing, search, and computer use over long contexts [00:10:01].

The next stage envisions agents as “co-innovators” [00:10:27]. This combines current reasoning and tool-use capabilities with creativity, enabled through human-AI collaboration [00:10:42]. The goal is to create new affordances for humans and AI to co-create the future [00:10:52].

Future predictions include:

Invisible software creation for all: Enabling non-coders to create and deploy their own tools directly from mobile devices [00:21:32]. This could facilitate starting businesses from scratch [00:19:18].
Changed internet access: Users will click less on links and access the internet through model lenses, providing cleaner, more personalized, and multimodal outputs [00:21:51]. For instance, learning about the solar system could generate an interactive 3D visualization instead of text [00:22:13].
Blank canvas interface: The AI interface will dynamically adapt to the user’s intent. If a user wants to write code, the canvas transforms into an IDE; if they want to write a novel, it provides tools for brainstorming, editing, and visualizing plot structures [00:22:42].
Co-direction and new knowledge creation: Co-innovation will occur through creative co-direction with models, leveraging highly reasoning multiagentic systems to create new novels, films, games, and fundamentally, new science and knowledge [00:23:31].

Tubegraph

Explorer

Table of Contents