From: redpointai
Percy Liang, a leading AI researcher and co-founder of Together AI, has explored the transformative potential of generative agents and simulation in AI [00:00:03, 00:00:12]. His work extends beyond traditional language models to creating complex virtual worlds and understanding the broader ecosystem of AI [00:01:39, 00:15:56].
Generative Agents: A “Sims-like” Virtual World
Liang spearheaded the creation of a virtual world, likened to The Sims, where AI agents interact with each other [00:00:14, 00:22:19]. This environment allows researchers to study complex social dynamics [00:00:17].
Each agent in this simulation is powered by a language model, equipped with a set of prompts and grounded in a virtual environment where they can move and communicate [00:23:25, 00:23:35]. The project was driven by a spirit of pure exploration and experimentation, observing what would emerge [00:23:41, 00:23:46].
Emergent Behaviors
Notably, many phenomena seen in human social dynamics naturally arose in this virtual world, such as information diffusion [00:23:49, 00:23:57]. An example cited was an agent announcing a mayoral run and attempting to convince others, showcasing emergent behavior [00:24:00, 00:24:11].
Beyond Believability: Towards Valid Simulations
While initial generative agents focused on creating “believable” simulations, the next crucial step is to achieve “valid” simulations that accurately reflect reality [00:24:21, 00:24:30].
The ability to create valid simulations would unlock numerous new possibilities, including:
- Digital Twin of Society: Establishing a “digital twin” of society to run experiments, such as testing the impact of a mask policy or a new law [00:24:50, 00:25:06].
- Social Science Studies: Conducting social science studies more efficiently and affordably, without the limitations of recruiting human participants (e.g., college kids) [00:25:31, 00:25:45].
- Controlled Experiments: Performing experiments where the same agent can be subjected to both a treatment and a control scenario by resetting their memory, offering a “cleaned” control impossible with humans [00:25:55, 00:26:10].
It is acknowledged that while these simulations are currently in a preliminary stage, future advancements in models could lead to trusted simulation tools for making significant decisions [00:25:08, 00:25:20].
Types of AI Agents
Liang distinguishes between two types of AI agents:
- Task-Performing Agents: Those capable of performing difficult tasks, like OpenAI’s O1 model [00:26:55, 00:27:04].
- Simulation Agents: Those focused on mimicking human behavior or individuals, rather than performing specific tasks [00:27:06, 00:27:14].
The latter, simulation-oriented agents, are less studied but hold significant untapped potential for various applications [00:27:22, 00:27:30].
Distinguishing Modern AI Simulation
Traditional simulations, such as physical or weather models, are governed by fixed equations or simplified, stylized models [00:27:54, 00:28:12]. However, the advent of advanced language models allows for simulating systems with much greater detail and complexity than previously possible [00:28:18, 00:28:28].
Future Applications and Considerations
The potential applications of advanced simulations include:
- Major Life Decisions: Running simulations before making significant life decisions, such as potential investments [00:28:34, 00:28:42].
- Personal Preparation: Using language models to simulate conversations (e.g., a podcast interview or a date) for practice and preparation [00:28:58, 00:29:16].
- Organizational Design: Simulating different organizational structures within a company to predict outcomes [00:26:43, 00:26:47].
It is crucial to be cautious, as current simulations are still far from perfectly reflecting reality [00:29:28, 00:29:37].
Relation to Evaluation and Benchmarking
The development of generative agents and their complex behaviors necessitates an evolution in evaluation and benchmarking methods [00:30:00]. Traditional train/test splits are challenged by the unknown contents of training data [00:30:15, 00:30:28].
The ability of language models to perform diverse instructions means that traditional, single-task benchmarks are insufficient [00:31:44, 00:31:53]. Liang notes the development of “Auto Bencher,” which leverages language models to invent automatic inputs, creating more sensible evaluations [00:32:13, 00:32:46].
Furthermore, there is a need for more structured evaluation, such as using rubrics to anchor judgments, rather than relying on superficial assessments of output quality [00:33:33, 00:34:04]. Academic institutions, like Stanford, are uniquely positioned to develop objective benchmarks that serve the industry and specific verticals [00:34:23, 00:35:35].
Underexplored Application Areas
While many AI applications are driven by commercial needs (e.g., RAG solutions, summarization), Liang highlights underexplored areas related to these models:
- Fundamental Science and Discovery: Using models for scientific discovery [00:59:21].
- Researcher Productivity: Improving the productivity of researchers [00:59:26].
These areas, though less commercially immediate, are vital for feeding into and improving the entire AI ecosystem [00:59:34, 00:59:41].
Agents: Overhyped and Underhyped
When asked about overhyped and underhyped aspects of AI, Liang humorously stated “agents and agents” [00:57:07, 00:57:09], suggesting they have undergone a full hype cycle [00:57:11]. He expressed optimism that AI agents could contribute novel insights to machine learning research within years, similar to how coding tools have evolved [00:58:33, 00:58:56]. This does not mean they are close to AGI, but rather making meaningful contributions in specific domains [00:58:05, 00:58:39].