Comparison between commaai and competitors like Tesla and Waymo

From: jimruttshow8596

comma.ai is a company founded by George Hotz that offers an open-source self-driving car system [00:03:57]. This system aims to enable cars to drive autonomously using an approach that differs significantly from competitors like Waymo and Tesla [00:06:05].

Core Philosophy and Approach

comma.ai’s core philosophy for autonomous driving is based on emulating human driving behavior using only cameras [00:06:41]. The company believes that humans, who only have two cameras (eyes), are the only truly Level 5 capable driving system [00:06:35]. Their system, known as Openpilot, is designed to make driving “chill” [00:37:10].

Instead of relying on high-precision maps or additional sensors like LiDAR, comma.ai focuses on predicting “where a human drive the car” given the road conditions [00:17:07] [00:26:06]. This approach is more holistic and aims to determine the action rather than building a detailed “world model” with precise localization of objects [00:34:49].

A key aspect of comma.ai’s development is its use of a massive, diverse dataset collected from its 10,000 weekly active users who upload driving data [00:19:14] [00:19:21]. This dataset is the second largest in the world after Tesla’s [00:19:12]. They also use a unique “small offset simulator” that perturbates human video data rather than relying on a hand-coded game engine [00:19:48].

Comparison with Waymo

Waymo’s approach to self-driving cars is distinct from comma.ai’s:

Sensor Suite: Waymo utilizes LiDAR and other “more penetrative sensing systems” in addition to cameras [00:06:11].
Mapping: Waymo invests heavily in “super high resolution mapping” [00:26:48], aiming for centimeter-level precision [00:26:22]. This is seen as a “super fragile” system by Hotz [00:26:34].
Operating Domain: Waymo operates as a Level 4 system, meaning it functions within defined, carefully mapped regions [00:26:54]. Hotz describes them as “trackless monorails” operating with “virtual rails” [00:28:48].
Remote Control: Waymo and Cruise cars frequently rely on human operators making decisions from a call center [00:30:07] [00:30:11]. Their dependence on cellular networks means they stop if connectivity is lost [00:30:14].
Unit Economics: Waymo has “hilariously negative unit economics,” with Robo-taxis potentially costing $500,000 [00:27:16] [00:32:09]. This contributes to the economic non-viability of their approach, leading to a “race to the bottom” in the market [00:32:51] [00:32:54].
Miles Driven: As of December 2023, Waymo had driven only 7 million miles in no-driver mode [00:30:30] [00:40:26]. In contrast, comma.ai has driven over 100 million miles [00:40:42].

Comparison with Tesla

Tesla’s Full Self-Driving (FSD) system shares more similarities with comma.ai than Waymo:

Sensor Suite: Both rely primarily on cameras, with Tesla famously denouncing LiDAR [00:06:43].
Unit Economics: Both Tesla and comma.ai are profitable businesses selling products to consumers today [00:33:40] [00:34:11].
Local Processing: Both systems process data and run their models locally on the car’s device, without relying on real-time internet connectivity for driving decisions [00:30:27] [00:39:08].
Miles Driven: Tesla has logged an estimated 3.3 billion miles in engaged autopilot mode, significantly more than comma.ai’s 100 million miles and Waymo’s 7 million miles [00:40:39] [00:40:42].

Despite these similarities, key differences exist:

Computational Power: Tesla’s system uses about 100x more CPU power than comma.ai’s comma 3X device [00:35:27]. Tesla also trains on 4,000 GPUs compared to comma.ai’s 40 GPUs [00:35:49].
Functional Capability vs. Usability:
- Tesla: Has greater “high-end capabilities” and can navigate complex scenarios like right turns at lights and highway interchanges without disengagement [00:37:01] [00:37:47]. However, its driving experience is often described as “jarring” and prone to “sketchy mistakes” like phantom braking or mis-tracking lanes [00:36:27] [00:37:18]. Tesla’s failure modes are more abrupt, akin to a “powerful optimizer” snapping between local minima [00:38:51].
- comma.ai: While not as feature-rich in complex urban navigation, comma.ai focuses on a “smooth driving is safe driving” principle [00:36:51]. Its system is designed to be “chill” and can drive for hours on highways without intervention [00:24:56]. When overwhelmed, comma.ai’s system becomes “shaky and unsure,” a more human-like failure mode [00:38:16].
Underlying Algorithms: Tesla’s “end-to-end” approach still relies on a “rigid MPC (Model Predictive Control) cost-based planner” [00:38:26]. comma.ai’s latest model directly predicts where a human would drive [00:17:07].
Marketing: Tesla’s “full self-driving” label is seen as “nuts” and “overselling” compared to comma.ai’s more cautious approach [00:52:05].

Legal and Regulatory Aspects

The “levels of self-driving automation” (Level 0 through Level 5) primarily dictate liability rather than capability [00:07:20].

Level 2: Human is still fully liable for decisions [00:07:25].
Level 3: Human is liable in certain scenarios [00:07:39].
Level 4: Human is not liable in specific operational design domains (e.g., cities) [00:07:43].
Level 5: Human is never liable, representing full automation [00:07:47].

comma.ai explicitly states it has “no interest in ever going past Level 2” or taking on liability [00:48:39]. They self-certify compliance with automotive standards and ensure their system never makes the car uncontrollable [00:42:43] [00:43:17]. The user is always in control, with the ability to easily override the system [00:43:24]. A driver monitoring camera ensures the user keeps their eyes on the road [00:44:03].

The company’s stance on liability is that “the human is in control of the car at all times” [00:50:21], and therefore, the human is responsible for any incidents [00:50:06]. This contrasts with companies aiming for higher levels of autonomy, where liability shifts away from the driver.

Technical Challenges and Advancements

One of the initial challenges faced by comma.ai was the “behavioral cloning problem” [00:17:27]. A model trained on human driving data (supervised learning) fails to drive in the real world because small “Epsilon errors” accumulate over time, leading to divergence [00:11:02]. This occurs because the training data is based on human policy, not machine policy, violating the Independent and Identically Distributed (IID) assumption of machine learning [00:11:41].

comma.ai addressed this by adding a “corrective pressure” based on lane lines, though these were eventually removed as a non-physics-based and ambiguous definition [00:15:15] [00:16:04]. The core solution involves training in simulation, where the model’s own policy generates data, allowing it to learn to correct itself and converge [00:17:40].

Hotz dismisses the common criticism from AI gurus like Gary Marcus that autonomous driving is impossible due to “a zillion corner cases” [00:12:43]. He argues that even rare corner cases are abundant in datasets of hundreds of millions of miles, and humans operate with far less data [00:13:00]. The problem is not a lack of “general intelligence” – a “meaningless term” – but rather the accumulation of errors over time and the current lack of a robust integrated “world model” that humans possess [00:13:50] [00:14:50].

Recent advancements in Transformers are enabling new simulator technologies, including comma.ai’s move to a “third Paradigm” that is more generic [00:48:04] [00:54:56]. The company also utilizes Tinygrad, a simplified machine learning framework, to train and deploy models efficiently on various hardware [00:55:58] [00:57:29].

Future Vision

comma.ai views self-driving cars as a “stepping stone” to general purpose robotics and “artificial life” [00:54:49] [00:39:54]. While driving is a “narrow AI” problem [00:40:06], its relative ease of data collection and low dimensionality make it a good starting point for building more complex systems, such as a “robot companion” capable of making sandwiches and cleaning [00:46:25] [00:47:28]. The ultimate goal is to build software that is a “10x better driver than a human” [00:48:50].

Tubegraph

Explorer

Table of Contents