Challenges and advancements in autonomous driving

From: jimruttshow8596

This article explores the landscape of autonomous driving technology, focusing on the approaches, challenges, and future visions presented by George Hotz, founder of comma.ai [00:03:48].

George Hotz: Background and Motivation

George Hotz, a precocious individual selected for the Johns Hopkins Center for Talented Youth, gained early recognition in hacker circles [00:37:38]. At age 17, he was the first to break the carrier lock on the iPhone [01:24:25]. Later, he was recruited into Google’s Project Zero, an elite team of white-hat hackers tasked with finding “zero-day” exploits in widely distributed technologies [02:20:53]. A zero-day exploit is a previously unknown vulnerability in software [02:53:30]. Hotz’s work at Project Zero led him to think about automating vulnerability discovery, a recurring theme in his life [03:05:00]. This desire for automation eventually motivated him to found comma.ai [03:16:00].

Hotz initially considered a contract to build software for Tesla to replace the Mobileye chip, which Intel later acquired [04:41:00]. When that contract didn’t materialize, he decided to build an autopilot clone himself, intending to sell it to car companies [05:12:00]. Building the clone took a couple of months, but selling it to car companies proved impossible [05:22:00]. Mobileye chips run proprietary perception algorithms to detect lane lines and cars, enabling Advanced Driver-Assistance Systems (ADAS) features [05:32:00].

Core Philosophy: Camera-Only Approach

A key philosophical fork in autonomous driving technology is between those who believe it can be achieved with only cameras and those who advocate for additional sensing systems like LiDAR [06:05:00]. Hotz firmly believes in the camera-only approach, drawing an analogy to human driving:

“There’s one system that can drive cars and its human beings… A human has two cameras” [06:20:00].

This aligns with Elon Musk’s contrarian viewpoint against LiDAR [06:45:00]. Hotz asserts this is now the “correct” viewpoint, though Waymo still utilizes LiDAR [06:52:00].

Levels of Self-Driving Automation

The six levels of self-driving automation (Level 0 through Level 5) often describe liability rather than capability [07:11:00].

Level 2: The human driver is still fully liable for the car’s decisions [07:25:00]. It entails supervision of the car [07:30:00].
Level 3: The human is liable in certain scenarios [07:39:00].
Level 4: The human is not liable in specific, defined regions, such as cities [07:43:00].
Level 5: The human is never liable, implying full automation where one could sleep in the back seat [07:47:47].

Hotz views predictions of full automation (Level 5) being “two years away” as hubris, referencing Google’s early prototype without a steering wheel [08:05:00].

Human Driving: A Tough Benchmark

Humans are surprisingly good drivers [08:51:00]. In most civilized countries, there’s about one fatality per 100 million miles driven [08:56:00]. This performance sets a very high bar for autonomous systems [09:32:00]. Hotz emphasizes that humans are “absurdly good drivers,” noting that most people drive thousands of times without a crash [09:51:00].

Addressing Challenges of AI Modeling and Simulation

The Behavioral Cloning Problem

Initial attempts to build self-driving software involved using a camera to predict steering wheel angle via supervised learning (f(x)=y, where x is the image and y is the steering angle) [10:49:00]. This approach failed on the road despite low training and test set loss [11:02:00]. The reason is the “behavioral cloning” problem: the model is not acting in the world during training [11:23:00]. Data is collected from human driving (“human policy”), but the machine’s slight errors (“Epsilon error”) accumulate over time because actions at time T affect input data at time T+1, violating the Independent and Identically Distributed (IID) assumption [11:41:00].

Solution: Simulation and Corrective Pressure

To address this, comma.ai initially added a small amount of corrective pressure by detecting lane lines and adjusting based on the car’s position [15:15:00]. However, the reliance on lane lines was problematic due to their ambiguous definition and lack of a physics-based standard [15:37:00]. They eventually removed the dependency on lane lines [16:04:00].

The ultimate solution for behavioral cloning is training in simulation [17:39:00]. In a simulator, the model trains with data driven by its own policy, allowing it to learn how to correct itself and converge [17:51:00]. They use a “hugging test” in an Unreal Engine simulator to measure how quickly the car returns to the center of a lane after being initialized off-center [18:20:00].

Data Collection and Unique Simulation Approach

comma.ai boasts the second-largest driving dataset globally after Tesla, with 10,000 weekly active users uploading data, resulting in tens of millions of diverse miles [19:12:00]. Unlike Waymo’s hand-coded game engine simulator, comma.ai’s “small offset simulator” is reprojective [19:45:00]. It takes human video and applies small geometric perturbations to simulate driving in slightly different positions, addressing the challenge of modeling other cars’ behavior [19:50:50].

comma.ai’s Product and Capabilities

comma.ai offers an opensource selfdriving car technology system called Openpilot [03:57:00]. The comma 3x device, costing $1250 [23:06:00], supports 275 car models [21:03:00]. Installation is simple: it involves unplugging a camera cable behind the rearview mirror, using a Y-splitter, and plugging in the comma device. This process takes about 15 minutes [21:31:00]. The device is not “hacking” the car; it intercepts and improves messages from the car’s existing camera system [21:55:00]. It selectively blocks or passes messages, ensuring emergency braking remains active by default [22:17:00].

The system significantly enhances existing driver assistance [23:55:00]. Instead of just keeping a car within lane lines (which can be jerky), comma.ai applies smooth torque to keep the car centered, even on unmarked roads, emulating human driving [22:42:00]. On interstate highways, it can drive for an hour or more without human intervention [24:50:00]. An experimental mode can handle city driving, stopping at stop signs and lights, performing 90-degree turns and highway interchanges [25:05:00].

Comparison Between Comma.ai and Competitors Like Tesla and Waymo

Waymo

Waymo’s approach relies on super high-resolution mapping and operates in defined, carefully mapped regions, characteristic of a Level 4 system [26:52:00]. Hotz describes Waymo’s vehicles as “trackless monorails” due to their reliance on centralized infrastructure and remote operation [28:46:00]. Their systems require multiple operators for each car and stop if the cell phone network goes down, highlighting their fragility and centralization [29:36:00]. Waymo has “hilariously negative unit economics” with their $500,000 robo-taxis, making them vulnerable to cheaper, decentralized solutions [32:18:00]. As of December 2023, Waymo had only logged 7 million miles in no-driver mode [30:30:00].

Tesla

Tesla also uses a camera-only approach and processes most data locally on the vehicle, without remote assistance [39:08:00]. However, Tesla trains its models in data centers, similar to comma.ai [39:14:16]. Tesla applies significantly more CPU power (about 100x more) to the real-time problem than comma.ai, using dedicated silicon and consuming more power [35:16:00]. Tesla has a larger dataset, with an estimated 3.3 billion miles logged for autopilot, compared to comma.ai’s over 100 million [40:35:00].

From a functional standpoint, Tesla has more high-end capabilities and can execute complex maneuvers like navigating intersections and making turns at lights [37:56:00]. However, Tesla’s system can make “sketchy mistakes,” such as phantom braking or misinterpreting lanes, leading to jarring experiences for the user [36:22:00]. Hotz attributes this to Tesla’s more rigid, “modernist” approach, relying on object localization and Model Predictive Control (MPC) planners, which can snap between local minima, causing abrupt actions [38:19:00].

In contrast, comma.ai’s system is focused on “making driving chill” [37:09:00]. Their torque limits are lower, resulting in smoother, safer driving [36:43:00]. When overwhelmed, comma.ai’s system becomes “shaky and unsure,” similar to human failure modes, rather than jerking the wheel or slamming brakes [38:15:00].

Legal and Regulatory Aspects of Autonomous Vehicles

comma.ai operates as a Level 2 system [41:14:00]. In the U.S. automotive industry, manufacturers self-certify compliance with standards [42:41:00]. comma.ai self-certifies compliance with ISO 26262 and other standards, particularly regarding torque limits and braking force [42:53:00].

For a Level 2 system, the human remains in control of the vehicle at all times [43:16:00]. comma.ai guarantees that the car will never become uncontrollable; users can always use the brakes or overpower the steering [43:21:00]. If a crash occurs, the liability lies with the human driver [43:37:00].

comma.ai also implements a driver monitoring system, which uses a camera to ensure the driver’s eyes are on the road at all times [44:03:00]. This system is designed to be non-intrusive and provide helpful alerts, preventing alert fatigue [44:40:00]. Telemetry data is opt-out, but users are encouraged to upload it as a “common good” to improve the system [45:37:00].

Hotz emphasizes that comma.ai’s system is designed not to be an insurance company and has no interest in taking liability beyond Level 2 [48:40:00]. While their software could potentially enable a 10x reduction in accidents, the decision to provide a Level 5 liability layer is left to others who might use their open-source software [49:01:00].

Future Directions and Challenges for AI and AGI

Hotz views self-driving cars as a stepping stone to general-purpose robotics [40:08:00]. While self-driving cars are a significant narrow AI problem, they are easier than general robotics because:

It’s simpler to gather large datasets of good driving from the car’s perspective [46:32:00].
Driving is a low-dimensional problem (steering and acceleration), unlike a multi-dimensional robot arm [46:56:00].

The long-term dream for comma.ai is to sell a “comma body”—a $25,000 robot companion capable of cooking and cleaning [47:34:00].

Hotz also leads Tinygrad, a machine learning framework that competes with TensorFlow and PyTorch [55:55:00]. Tinygrad distinguishes itself by its extreme simplicity, with a codebase of only 5200 lines, yet capable of running complex models like stable diffusion and Llama [56:11:00]. Its simplicity is designed to facilitate translation to custom hardware (ASICs) [57:08:00]. Tinygrad is already used in Openpilot to run models on the device [57:29:00].

Tubegraph

Explorer

Table of Contents