From: redpointai
The development of autonomous vehicles (AVs) presents both significant advancements and persistent challenges and opportunities in AI development. While there’s a potential future where human-driven cars are seen as “crazy” given the accident rates and complexity [00:00:05], [01:04:04], achieving widespread, safe, and reliable self-driving capability still involves addressing several critical issues.
Role of AI Models and Limitations
Waymo, a leader in self-driving technology, was developing its systems long before the recent explosion of large language models (LLMs) and diffusion models [01:30:00]. These new models have not required Waymo to discard existing technology, but rather enhance it by providing a “better teacher” and more information to existing models [02:55:00].
Key contributions of LLMs and Vision Models (VMS) to autonomous driving include:
- World Knowledge: Providing semantic understanding of the environment, such as recognizing different types of emergency vehicles or understanding accident scenes, which might not be frequently encountered in collected driving data [03:22:00]. This leverages external data sources like the web to enhance the driver’s capabilities [04:33:00].
- Reasoning Capabilities: Large pre-trained models, with extensive visual and text data, enhance the system’s ability to reason [04:47:00].
However, these AI models are not a complete solution. There are aspects of self-driving that require capabilities beyond the AI model itself:
- Safety and Regulatory Compliance: Strict contracts on safety and regulatory constraints must be expressed explicitly, not implicitly, outside the AI model [05:37:00]. An external layer verifies that the AI’s proposed driving plan meets all safety and compliance requirements, ensuring the car behaves reasonably at all times [05:55:00]. This “checking layer” provides essential guardrails [06:35:00].
Current State and Remaining Challenges
The current state of autonomous vehicles, particularly for companies like Waymo, shows “not many big blockers” in terms of fundamental capabilities [10:31:00]. Challenges like driving in fog or on highways have been largely addressed [11:03:00]. However, some areas like driving in snow are yet to receive full attention [10:37:00].
The predominant challenges in AI adoption and deployment now revolve around scaling [11:11:00]. Specifically, managing the “long tail of problems” that arise when self-driving cars operate for millions of miles [11:30:00]. Events a human driver might encounter once in a lifetime become weekly or monthly occurrences for an autonomous fleet, making “exceptional and weird” situations common [11:50:00]. Solving this long tail is a primary focus, with AI and large model capabilities expected to accelerate solutions [12:11:00].
To address these rare “long-tail” scenarios, Waymo utilizes:
- Simulation: Extensive use of simulation to synthesize scenarios that are known to potentially happen but rarely observed in real-world data [12:37:00].
- Scenario Modification: Modifying real-world scenarios where “nothing really bad happened” to make them worse, such as introducing drunk or actively adversarial drivers, to make the system more robust and reactive [13:18:00].
Needed Technical Advances
One significant technical advance that could “completely yet again change the landscape” for autonomous driving is the development of reliable, physically realistic World models [14:05:00]. These models would allow for simulating the real world with physical realism and accurate rendering, akin to a “digital twin” of the world for AVs [15:08:00].
While early “Proto-World models” like Sora or Veo can predict video sequences that seem physically reasonable [14:45:00], the challenge lies in making them controllable and physically realistic while remaining rich and plausible [15:08:00]. Current models struggle with long-tail problems, as they are not yet good at generalizing to rare events [15:56:00].
A major bottleneck in World model building, especially for functional applications like autonomous driving, is the deep question of causality [17:30:00]. While models can learn correlations and generate plausible videos, achieving controllability requires the model to understand causality – how an input change leads to a specific output [18:00:00]. Injecting causality into AI models has historically been a struggle in machine learning [18:16:00]. It’s uncertain if this will require new architectures or simply proper data engineering and inductive biases [46:46:00].
Sensor Suite Debate
Waymo uses a rich sensor suite of cameras, LiDAR, and radar, which are “remarkably complimentary” due to their orthogonal strengths and weaknesses, allowing for cross-verification [22:30:00]. This approach contrasts with other companies that started with simpler, cheaper L2 (driving assistance) systems and are attempting to climb to L4 (full autonomy) [23:17:00].
Waymo’s strategy was to “possibly over-sensorize” initially to solve the hard problem and then inform decisions on cost reduction and simplification [24:20:00]. The argument for using only cameras (like humans with eyes) is countered by the belief that the bar for L4 driving is “above human level” [26:32:00]. Waymo’s safety reports indicate they are already safer than the average human driver [26:46:00]. The necessity of a complex sensor suite for superhuman performance remains a key question for the coming years [27:31:00].
Milestones and Future Trajectory
The autonomous vehicle industry has a history of over-optimism, as evidenced by the 1995 transcontinental autonomous ride that achieved 99% autonomy, leading many to believe commercial deployment was imminent [28:32:00]. It took 30 years to reach the point of commercial deployment [29:17:00].
Today, Waymo has validated its technology in cities like Phoenix and San Francisco, and user feedback indicates strong product love [29:32:00]. The main barrier remaining is scaling [30:09:00]. Therefore, the next major milestones will be focused on expansion into various geographies, such as Waymo’s initial data collection and left-side driving experiment in Tokyo [30:16:00]. This involves not just technological adaptation but also managing the logistics of setting up operations and building trust with local communities and regulators [21:36:00].