From: redpointai

Percy Liang, a leading AI researcher and co-founder of Together AI, has explored the transformative potential of generative agents and simulation in AI [00:00:03, 00:00:12]. His work extends beyond traditional language models to creating complex virtual worlds and understanding the broader ecosystem of AI [00:01:39, 00:15:56].

Generative Agents: A “Sims-like” Virtual World

Liang spearheaded the creation of a virtual world, likened to The Sims, where AI agents interact with each other [00:00:14, 00:22:19]. This environment allows researchers to study complex social dynamics [00:00:17].

Each agent in this simulation is powered by a language model, equipped with a set of prompts and grounded in a virtual environment where they can move and communicate [00:23:25, 00:23:35]. The project was driven by a spirit of pure exploration and experimentation, observing what would emerge [00:23:41, 00:23:46].

Emergent Behaviors

Notably, many phenomena seen in human social dynamics naturally arose in this virtual world, such as information diffusion [00:23:49, 00:23:57]. An example cited was an agent announcing a mayoral run and attempting to convince others, showcasing emergent behavior [00:24:00, 00:24:11].

Beyond Believability: Towards Valid Simulations

While initial generative agents focused on creating “believable” simulations, the next crucial step is to achieve “valid” simulations that accurately reflect reality [00:24:21, 00:24:30].

The ability to create valid simulations would unlock numerous new possibilities, including:

  • Digital Twin of Society: Establishing a “digital twin” of society to run experiments, such as testing the impact of a mask policy or a new law [00:24:50, 00:25:06].
  • Social Science Studies: Conducting social science studies more efficiently and affordably, without the limitations of recruiting human participants (e.g., college kids) [00:25:31, 00:25:45].
  • Controlled Experiments: Performing experiments where the same agent can be subjected to both a treatment and a control scenario by resetting their memory, offering a “cleaned” control impossible with humans [00:25:55, 00:26:10].

It is acknowledged that while these simulations are currently in a preliminary stage, future advancements in models could lead to trusted simulation tools for making significant decisions [00:25:08, 00:25:20].

Types of AI Agents

Liang distinguishes between two types of AI agents:

  1. Task-Performing Agents: Those capable of performing difficult tasks, like OpenAI’s O1 model [00:26:55, 00:27:04].
  2. Simulation Agents: Those focused on mimicking human behavior or individuals, rather than performing specific tasks [00:27:06, 00:27:14].

The latter, simulation-oriented agents, are less studied but hold significant untapped potential for various applications [00:27:22, 00:27:30].

Distinguishing Modern AI Simulation

Traditional simulations, such as physical or weather models, are governed by fixed equations or simplified, stylized models [00:27:54, 00:28:12]. However, the advent of advanced language models allows for simulating systems with much greater detail and complexity than previously possible [00:28:18, 00:28:28].

Future Applications and Considerations

The potential applications of advanced simulations include:

  • Major Life Decisions: Running simulations before making significant life decisions, such as potential investments [00:28:34, 00:28:42].
  • Personal Preparation: Using language models to simulate conversations (e.g., a podcast interview or a date) for practice and preparation [00:28:58, 00:29:16].
  • Organizational Design: Simulating different organizational structures within a company to predict outcomes [00:26:43, 00:26:47].

It is crucial to be cautious, as current simulations are still far from perfectly reflecting reality [00:29:28, 00:29:37].

Relation to Evaluation and Benchmarking

The development of generative agents and their complex behaviors necessitates an evolution in evaluation and benchmarking methods [00:30:00]. Traditional train/test splits are challenged by the unknown contents of training data [00:30:15, 00:30:28].

The ability of language models to perform diverse instructions means that traditional, single-task benchmarks are insufficient [00:31:44, 00:31:53]. Liang notes the development of “Auto Bencher,” which leverages language models to invent automatic inputs, creating more sensible evaluations [00:32:13, 00:32:46].

Furthermore, there is a need for more structured evaluation, such as using rubrics to anchor judgments, rather than relying on superficial assessments of output quality [00:33:33, 00:34:04]. Academic institutions, like Stanford, are uniquely positioned to develop objective benchmarks that serve the industry and specific verticals [00:34:23, 00:35:35].

Underexplored Application Areas

While many AI applications are driven by commercial needs (e.g., RAG solutions, summarization), Liang highlights underexplored areas related to these models:

  • Fundamental Science and Discovery: Using models for scientific discovery [00:59:21].
  • Researcher Productivity: Improving the productivity of researchers [00:59:26].

These areas, though less commercially immediate, are vital for feeding into and improving the entire AI ecosystem [00:59:34, 00:59:41].

Agents: Overhyped and Underhyped

When asked about overhyped and underhyped aspects of AI, Liang humorously stated “agents and agents” [00:57:07, 00:57:09], suggesting they have undergone a full hype cycle [00:57:11]. He expressed optimism that AI agents could contribute novel insights to machine learning research within years, similar to how coding tools have evolved [00:58:33, 00:58:56]. This does not mean they are close to AGI, but rather making meaningful contributions in specific domains [00:58:05, 00:58:39].