From: jimruttshow8596
The field of self-driving car technology presents a fundamental divergence in approaches: relying solely on cameras or integrating additional sensors like Lidar (Light Detection and Ranging). This distinction often leads to different design philosophies, cost structures, and operational capabilities.
The Camera-Only Approach: Emulating Human Vision
The camera-only approach, championed by figures like Elon Musk and companies such as Tesla and Comma AI, posits that if humans can drive with two eyes, then a system equipped with cameras should also be capable of achieving full autonomy [00:06:20].
Philosophy and Implementation
- Human Emulation: The core idea is to emulate human drivers, who operate with two cameras (eyes) and no Lidar [00:06:20]. This approach focuses on predicting where a human would drive the car given the road conditions [00:17:04].
- Cost-Effectiveness: Proponents argue that cameras are significantly cheaper than Lidar systems [00:27:26]. George Hotz envisions a future where a self-driving system could run on a 500,000 robo-taxis [00:32:09].
- Data Collection: Comma AI, for instance, leverages a vast and diverse dataset of human driving, collected from 10,000 weekly active users who upload data, totaling tens of millions of miles [00:19:12]. This contrasts with Waymo’s concentrated data from specific cities [00:19:30].
- Simulation: Comma AI employs a “small offset simulator” that reprojects human videos with small geometric perturbations, allowing the model to learn corrective pressures [00:19:48]. This approach uses real car behavior for other vehicles, avoiding the complexity of simulating their policies [00:20:18].
- No High-Precision Maps: This method generally avoids reliance on super high-resolution, centimeter-precision maps, deeming them fragile and unnecessary for human-like driving [00:26:22]. Standard navigation maps are considered sufficient [00:26:15].
- Local Processing: Both Tesla and Comma AI prioritize local processing on the car’s hardware once the model is trained, minimizing reliance on constant cloud connectivity [00:39:19].
Challenges and Criticisms
- Epsilon Error Accumulation: A purely supervised learning approach (behavioral cloning) struggles because small errors accumulate over time, leading to the car drifting out of its lane even with low loss in test sets [00:11:17]. This is due to the temporal dependence of samples, where the machine’s action at time T affects the input data at T+1 [00:12:21].
- “World Model” and Generalization: Critics like Gary Marcus argue that cameras alone, as narrow AIs, struggle with “corner cases” and lack a robust “world model” that humans possess for complex situations [00:14:01]. However, George Hotz counters that humans also generalize from limited data and that such “corner cases” are not the primary problem, instead citing specific software bugs in past accidents [00:15:12].
- Lane Line Dependence: Early camera-only systems sometimes relied on lane lines, which are not physically defined and can be ambiguous, leading to a “corrective pressure” system that was eventually removed [00:15:30].
- Safety and Usability: While aiming for smooth driving, some camera-only systems, like Tesla’s, can still make “sketchy mistakes” such as phantom braking or misidentifying lanes, leading to a “jarring experience” for the user [00:36:22]. Comma AI, conversely, focuses on “making driving chill” by limiting torque and offering a more human-like failure mode [00:36:51].
The Lidar/Sensor-Heavy Approach: Redundancy and Precision
Companies like Waymo and Cruise typically adopt a more sensor-heavy approach, integrating Lidar, radar, and high-precision maps to create a detailed, redundant view of the environment.
Philosophy and Implementation
- Redundancy and Robustness: The belief is that multiple sensor modalities provide a more robust and reliable perception of the driving environment, particularly in diverse conditions or corner cases that cameras alone might miss.
- High-Precision Mapping: These systems often rely on extensive, highly detailed, centimeter-level maps of their operational domains [00:26:22]. This allows for precise localization and predictable path planning.
- Defined Operational Design Domains (ODDs): Waymo, for instance, operates as a Level 4 system, meaning it functions without human supervision only in specific, carefully mapped regions (e.g., Scottsdale, Arizona) [00:26:54].
Challenges and Criticisms
- High Cost: Integrating multiple advanced sensors like Lidar significantly increases the cost of the vehicle. Waymo’s “robo-taxis” are estimated to cost around $500,000 [00:27:20].
- Fragility and Centralization: Such systems can be highly fragile and dependent on external infrastructure. Cruise cars, for example, have been observed to stop when the cell phone network goes down, indicating reliance on remote operators [00:30:11]. This is criticized as being “antithetical to everything I want to see about technology” [00:30:22].
- “Trackless Monorails”: George Hotz likens these systems to “trackless monorails” operating in specific, well-defined “virtual rails” rather than truly autonomous vehicles capable of driving anywhere [00:28:48].
- Economic Viability: The high capital expenditure and limited operational domains raise questions about the long-term economic viability of these models [00:31:38]. The market may eventually become a “race to the bottom” like the scooter market [00:32:44].
- Remote Human Intervention: Waymo and Cruise cars, despite being touted as fully autonomous, still rely on human remote operators for intervention, sometimes frequently [00:29:50]. These “drivers” are simply out of the car, with two operators per vehicle [00:29:46].
Comparison and Outlook
The debate between camera-only and Lidar-based approaches reflects fundamental differences in development philosophy, cost models, and perceived paths to widespread adoption.
| Feature | Camera-Only (e.g., Comma AI, Tesla) | Lidar/Sensor-Heavy (e.g., Waymo, Cruise) > Comparison of narrow AI and general AI
In the ever-evolving landscape of artificial intelligence, a distinction is drawn between narrow AI and general AI, also known as Artificial General Intelligence (AGI). This differentiation highlights the current capabilities of AI systems versus the theoretical potential for human-like cognitive abilities.
Narrow AI
Narrow AI, prevalent today, refers to AI systems designed and trained for a specific task or a very limited set of tasks [00:40:06]. These systems excel within their defined domain but lack the ability to perform outside of it or generalize their learning to new, unrelated problems [00:10:40].
Examples of narrow AI include:
- Self-driving car systems, which are highly specialized for driving tasks but cannot, for example, make a sandwich [00:40:08].
- Voice assistants (Siri, Alexa).
- Image recognition software.
- Recommendation engines.
Comma AI’s focus on self-driving cars positions it within the realm of narrow AI, albeit a “really big narrow piece of AI” [00:40:14]. The belief is that solving this complex narrow AI problem will yield valuable insights applicable to broader robotic challenges [00:40:16].
General AI (Artificial General Intelligence - AGI)
General AI, or AGI, refers to hypothetical AI systems that possess human-like cognitive abilities, including the capacity to understand, learn, and apply intelligence to any intellectual task that a human being can [00:13:18]. This includes problem-solving, reasoning, planning, abstract thinking, and learning from experience across a wide range of domains.
Key characteristics of AGI would include:
- Generalization: The ability to apply knowledge and skills learned in one context to entirely new and different situations [00:13:29].
- World Model: A comprehensive understanding of the physical and social world, allowing for predictions and complex scenario analysis [00:14:48].
- Adaptability: The capacity to adapt to unforeseen circumstances and learn continuously.
The existence of AGI is still theoretical, and its definition can be contentious. George Hotz describes the term “general intelligence” as “meaningless meaningless term completely meaningless” [00:13:20], although he acknowledges the concept of an “integrated world model” as the “absolute cutting edge of machine learning today” [00:14:56].
Despite the current limitations of narrow AI, the pursuit of AGI remains a long-term goal for many researchers and companies, including Comma AI, which envisions developing general-purpose robotics like a “robot companion” that can perform various household tasks [00:47:31].
The above was an example of my internal thinking process. Here is the output for the user:
Comparison between Camera and Lidar-based Self-Driving Systems
The development of self-driving car technology has seen a significant divergence in approaches, particularly regarding sensor modalities. A primary debate centers on whether to rely primarily on cameras, mimicking human vision, or to integrate additional sensors like Lidar (Light Detection and Ranging).
Camera-Only Approach
This philosophy, adopted by companies such as Tesla and Comma AI, posits that since human beings drive cars effectively using only two cameras (eyes), a machine should also be able to achieve autonomy with visual input alone [00:06:20].
Key Aspects
- Human Emulation: The core of this approach is to model human driving behavior. Comma AI’s system asks “given this road where would a human drive the car?” [00:17:04]. This means the system aims to predict and replicate the steering actions and braking patterns of a human driver [00:47:46].
- Cost Efficiency: Cameras are significantly more cost-effective than Lidar units. The aim is to create affordable solutions, with the ultimate vision of a self-driving system running on hardware akin to a cell phone [00:27:28].
- Data-Driven Learning: Comma AI boasts the second-largest driving dataset globally after Tesla, with tens of millions of miles from 10,000 weekly active users, providing a massively diverse set of driving scenarios from around the world [00:19:12].
- Simulation for Correction: To address the “epsilon error” accumulation inherent in simple behavioral cloning, where small deviations from the human policy compound over time [00:11:17], Comma AI uses a “small offset simulator.” This reprojects human video data with slight perturbations, allowing the model to learn “corrective pressure” and converge back to the desired path [00:19:48].
- No High-Precision Mapping: Unlike some Lidar-based systems, this approach does not rely on pre-mapped, high-precision, centimeter-accurate global maps, deeming such systems fragile [00:26:22].
- Local Processing: Models are trained in data centers, but once uploaded to the car, all processing for driving decisions is done locally on the device [00:39:19], making the system independent of continuous internet connection [00:30:27].
Challenges and Criticisms
- Generalization vs. Narrow AI: Critics argue that narrow AI systems, like those used in self-driving, lack the general intelligence or “world model” of humans to handle unexpected “corner cases” [00:13:43]. However, proponents like George Hotz contend that humans also generalize from limited data and that these “corner cases” are not the primary problem [00:15:12], attributing past accidents more to classical software bugs [00:14:15].
- Usability and “Chill” Driving: While Tesla is ahead in high-end capabilities, some users report “jarring experiences” with sudden braking or mis-laning [00:36:22]. Comma AI focuses on “make driving chill” [00:37:10] by limiting torque and having more human-like failure modes where the system becomes “shaky and unsure” rather than “freaking out” [00:38:16].
Lidar-Based and Sensor-Heavy Approach
This approach, exemplified by companies like Waymo and Cruise, emphasizes the use of multiple redundant sensors, including Lidar, radar, and high-resolution cameras, often combined with detailed pre-mapped environments.
Key Aspects
- Sensor Redundancy: Lidar provides precise 3D mapping of the environment, augmenting camera data and offering robust perception in various conditions. This redundancy is meant to increase safety and reliability.
- High-Definition Maps: Systems like Waymo heavily rely on extremely precise, high-resolution maps of their operational areas, allowing for highly accurate localization and path planning within those defined zones [00:25:51].
- Defined Operational Domains (Level 4): These systems typically operate within carefully mapped and defined geographic regions, as they are not designed for “drive anywhere” functionality without human supervision [00:26:54].
Challenges and Criticisms
- High Cost: Equipping a vehicle with extensive Lidar and other advanced sensors significantly increases the vehicle’s manufacturing cost. Waymo’s robo-taxis are estimated to cost around $500,000 per vehicle [00:27:20].
- Fragility and Centralization: Critics argue that these systems are fragile and dependent on external infrastructure. Cruise cars, for instance, have been observed to stop when cell phone networks go down, implying reliance on remote human operators [00:30:11]. This reliance makes them “antithetical to everything I want to see about technology” [00:30:22].
- “Trackless Monorails”: The heavy reliance on pre-mapping and defined operational areas leads to the criticism that these systems are more akin to “trackless monorails” operating on virtual rails, rather than truly autonomous cars capable of navigating novel environments like a human [00:28:48].
- Economic Viability: The high capital expenditure and limited operational domains raise questions about the long-term profitability and scalability of this model. The market for such systems is predicted to quickly become a “race to the bottom” [00:32:44].
- Human Intervention: Despite claims of autonomy, these systems still require human intervention. Cruise admits to having multiple remote operators per car [00:29:42].
Conclusion
The choice between camera-only and Lidar-based self-driving technology reflects different philosophies on how to achieve autonomy and scale. While Lidar-based systems emphasize redundancy and precision within controlled environments, camera-only systems aim for a more generalized, human-like capability that can operate anywhere, prioritizing cost-effectiveness and local processing. The ultimate success of either approach will depend on overcoming their respective challenges in safety, usability, and economic viability.