Advances and challenges in selfdriving car technology

From: jimruttshow8596

George Hotz, a notable figure in the tech world known for his early hacking exploits and work with Google’s Project Zero, is currently the president of Comma.ai, a company focused on open-source self-driving car systems [00:32:00]. This field has seen significant interest, with previous discussions on the Jim Rut Show featuring figures like Shaheen Farry and Jim Hackett [00:04:08].

Motivation and Initial Hurdles

Hotz’s journey into self-driving technology began with a potential contract to build software for Tesla to replace the Mobileye chip [00:04:41]. Although the contract didn’t materialize, it inspired him to build an “autopilot clone” to sell to car companies [00:05:12]. Mobileye chips perform proprietary perception algorithms, assisting with ADAS (Advanced Driver-Assistance Systems) features like perceiving lane lines and cars [00:05:33].

An initial attempt to use a camera to directly predict steering wheel angles via supervised learning (image as input, steering angle as output) failed [00:10:49]. Despite achieving low loss on training and test sets, the model couldn’t drive straight on the highway [00:11:11]. This is because the model wasn’t acting in the world during training; the data reflected human policy, not machine policy [00:11:23]. The “Epsilon error” accumulates over time, making it impossible to generalize from static data to real-world driving where the machine’s actions influence subsequent inputs [00:11:50].

Camera vs. Lidar-Based Systems

A significant debate in self-driving technology revolves around sensor reliance [00:06:05]. Hotz firmly believes that humans, who only have two cameras (eyes), are the only system truly capable of Level 5 self-driving, aligning with Elon Musk’s stance against Lidar [00:06:20]. He argues that Lidar is not necessary for self-driving [00:06:48].

Levels of Self-Driving Automation

The six levels of self-driving automation (Level 0 through Level 5) often relate more to liability than actual capability [00:07:11]:

Level 2: The human driver remains fully liable for decisions [00:07:25].
Level 3: Human liability applies in certain scenarios [00:07:39].
Level 4: Human liability is removed in specific areas (e.g., cities) [00:07:43].
Level 5: The human is never liable [00:07:47], implying full automation where one could sleep in the back seat [00:08:10].

Predictions of full self-driving by 2018-2019 were “total hubris,” exemplified by Google’s early prototypes without steering wheels [00:08:05].

The Human Benchmark

It’s a common misconception that humans are poor drivers [00:08:48]. Most civilized countries experience about one fatality per 100 million miles driven, a significant achievement not yet matched by self-driving companies [00:08:56]. Humans are “absurdly good drivers” [00:09:51], having driven thousands of times to work with zero or one crash [00:09:57].

Gary Marcus’s critique about a “zillion corner cases” in self-driving cars is dismissed by Hotz [00:12:43]. Human drivers experience far less data than modern systems are trained on [00:13:08]. The problem is not an inability to handle rare events, but rather the inability of narrow AIs to generalize like humans [00:13:27]. Hotz argues that the Uber accident, where the car failed to recognize a pedestrian with a bicycle and bags, was more akin to a classical software bug than a failing of deep learning [00:14:10].

Comma.ai’s Approach to Self-Driving Technology

Comma.ai’s core philosophy is to emulate human driving behavior [00:17:05].

Corrective Pressure: After initial failures with pure supervised learning, Comma.ai introduced a small amount of “corrective pressure” [00:15:15]. This involved training an algorithm to detect lane lines, find the center, and apply corrective steering torque [00:15:20].
Evolution from Lane Lines: Initially, the reliance on lane lines was considered an “original sin” as they lack a physics-based definition and human labeling is inconsistent [00:15:37]. Comma.ai eventually removed this dependency, focusing on “where a human would drive the car” on a given road [00:16:04].
Simulation for Correction: To combat the “behavioral cloning” problem where errors accumulate, Comma.ai uses simulation [00:17:25]. Unlike Waymo’s hand-coded game engine simulators, Comma.ai’s “small offset simulator” uses real human driving video and applies geometric perturbations to create varied scenarios [00:19:47]. This allows the model to learn corrective actions and converge, as tested by their “hugging test” which measures how quickly the car returns to the lane center when offset [00:18:20].
Data Collection: Comma.ai boasts the second-largest driving dataset globally after Tesla, with 10,000 weekly active users uploading data [00:19:12]. This provides tens of millions of miles of “massively diverse” data from various locations worldwide [00:19:21].
Installation and Functionality: The Comma 3X device costs $1250 [00:23:06] and takes about 15 minutes to install [00:21:39]. It connects to the car’s existing camera behind the rearview mirror, intercepting and improving lane-keep assist messages [00:21:37]. It selectively blocks or passes through messages, not disabling emergency braking by default [00:22:17]. The system helps with lateral control, keeping the car centered or where a human would place it, even on unmarked roads [00:24:40]. It allows for hours of hands-off driving on interstate highways [00:24:51] and has an experimental mode for city driving (stop signs, lights, turns) [00:25:07].
No High-Precision Maps: Comma.ai does not use high-resolution mapping like Waymo [00:25:56], believing that relying on centimeter precision on a global scale is “absurd” and non-robust [00:26:22]. They use standard definition maps, similar to what humans use [00:26:09].

Comparison with Competitors

Waymo: Hotz characterizes Waymo’s self-driving cars as “fancy remote control cars” or “trackless monorails” [00:26:26]. Their approach, operating in defined, carefully mapped regions, is Level 4 [00:26:54]. Waymo’s vehicles cost around $500,000 each and rely on extensive remote human operators, with cars stopping if the cell phone network goes down [00:29:36]. Hotz criticizes their “hilariously negative unit economics” [00:32:29] and their assumption of a static world without competition [00:32:51].
Tesla: Both Comma.ai and Tesla train models in data centers and then upload them to the car for local processing [00:39:14]. Tesla has positive unit economics, selling cars profitably today [00:33:37]. Tesla’s CPU power applied to the real-time problem is about 100x that of Comma.ai’s [00:35:27].
- Similarities: Both use camera-only approaches and operate anywhere, not just geo-fenced areas [00:37:39].
- Differences: Tesla views driving as a “fiscus problem” with a “modernist perspective,” using rigid maneuvers and displaying virtual 3D cars [00:34:26]. Comma.ai has a more “holistic” approach, focusing on what a human would do (“just tell me the action, don’t tell me the state”) [00:34:49]. While Tesla has greater high-end capabilities (e.g., navigating complex turns in experimental FSD) [00:37:01], Comma.ai claims superiority in usability and “chill” driving [00:36:14]. Tesla’s system can make “sketchy mistakes” like sudden braking or mis-tracking lanes, leading to jarring experiences [00:36:22]. Comma.ai’s system, with lower torque limits, is smoother and less jarring when overwhelmed [00:36:43], its “failure modes” being more human-like [00:38:55].

Legal and Regulatory Environment

Comma.ai operates as a Level 2 system [00:43:00], where the human is always in control and liable [00:43:17]. They self-certify compliance with automotive standards like ISO 26262 [00:42:41]. Key aspects of their regulatory compliance and liability stance include:

Control: The system limits torque and can always be overridden by two fingers [00:43:30]. The car never becomes uncontrollable [00:43:24].
Driver Monitoring: Comma.ai has “the best driver monitoring in the world” [00:44:42], using a camera to ensure the driver keeps eyes on the road [00:44:03]. This system is designed to be helpful, not intrusive, alerting drivers only when truly necessary to prevent alert fatigue [00:44:53].
Liability: If a crash occurs, liability rests with the human driver [00:43:38]. Comma.ai’s stance is that “the human is in control of the car at all times” [00:50:21]. They distinguish between functional safety (where a product malfunction, like brake failure, would be their liability) and judgment calls (where the driver is responsible) [00:53:22]. They have not faced claims for software bugs causing problems, only successfully defended against a patent troll [00:50:52].
Data Upload: Driving telemetry is opt-out, encouraged as a “common good” to improve the system [00:45:37]. Driver monitoring data (pictures) is not uploaded unless specifically opted in [00:45:28].

Future Vision

Hotz views self-driving cars as a “stepping stone” to general purpose robotics and “artificial life” [00:39:54]. Driving is considered a “big narrow piece of AI” [00:40:14], with lessons learned applicable to other robotics challenges. The driving problem is “low dimensional” (steering and acceleration) compared to the high dimensionality of a human hand [00:46:59].

Comma.ai’s long-term goal is to sell a “Comma Body,” a $25,000 robot companion capable of cooking and cleaning [00:47:34].

Tinygrad: A Foundational Project

Hotz is also the CEO of Tinygrad, a machine learning framework that competes with TensorFlow, PyTorch, and JAX [00:55:55]. Its key distinction is its simplicity, with a codebase of only 5200 lines [00:56:11]. Tinygrad can run models like Stable Diffusion and Llama, and train various networks [00:56:22]. It supports devices and data types generically, preventing “combinatorial explosion” common in other frameworks [00:56:47]. Tinygrad is used in Comma.ai’s Openpilot system to run models on devices [00:57:29]. The long-term goal for Tinygrad is to build machine learning ASICs, starting with software [00:57:11].

Future of Self-Driving Levels

Comma.ai has “no interest in ever going past Level 2” or taking on liability [00:48:39], aiming instead to build software that is “a better driver than a human,” perhaps 10x better [00:48:50]. They believe Level 4 is not viable as a business model, as Level 5 cars will arrive too quickly [00:49:18]. The focus remains on making driving “chill” for users [00:37:10].

Tubegraph

Explorer

Table of Contents