From: jimruttshow8596
The landscape of scientific inquiry is undergoing a significant shift, prompting discussions about a potential “bifurcation” in its methodologies [02:31:00]. This divergence is not primarily between theory and data, but rather between two distinct approaches: the fine-grained paradigms of prediction and the coarse-grained paradigms of understanding [02:48:00].
Historically, physical science benefited from a “lucky conjunction” where fundamental theories were also highly useful [03:05:00]. However, modern science may be reaching a point where, for the same topic, one might need two classes of knowledge: one for understanding and another for utility [03:31:00].
The Rise of Data-Driven Prediction
Recent advancements highlight instances where data-driven approaches have achieved remarkable practical success where traditional, theory-driven methods struggled or failed:
- AlphaFold and Protein Folding: The protein folding problem, once considered the “Fermat’s Last Theorem of computational chemistry,” was largely intractable through theoretical means [04:26:00]. AlphaFold, developed by a relatively small group with massive computational resources, achieved breakthroughs in predicting protein structures without offering “zero theoretical insight” into the underlying mechanisms [04:47:00].
- Transformer Models and Language: Traditional computational linguistics, relying on rules and parsers, made limited progress in understanding human language due to its “only marginally lawful nature” [05:29:00]. In contrast, “simple-minded technology” like Transformer models, combined with brute-force data and computation, have produced “unbelievably powerful language models” [05:38:00], yet these models initially provide “little insight into the mechanisms” [06:00:00].
Origins of Data-Driven Approaches
The origins of data-driven methods reveal an interesting historical trajectory:
- Induction: The concept of induction, as we know it, traces back to David Hume in the 18th century, who believed humans and the world were not rational in a deductive sense, leading his focus on associations [06:41:00].
- Statistics: Surprisingly, statistics, a mathematical technique for reducing error in measurements, emerged from the highly deductive field of classical physics, particularly celestial mechanics [07:02:00]. Developed by figures like Fisher (who introduced concepts like sufficiency) and Neyman & Pearson (hypothesis testing) in the 1920s and 30s, these inductive frameworks focused on parameter estimation [07:27:00].
- Neural Networks: Modern neural networks have a unique history [08:06:00]. The foundational paper by Warren McCulloch (a neurophysiologist interested in grounding epistemology in neurons) and Walter Pitts (a young genius interested in logic and mathematics) in 1943 was actually about deduction and logic, closer to symbolic AI, not induction or large data sets [09:45:45]. It was only later, in the 1970s and 80s, that these deductive neural networks began to fuse with inductive statistical methods, and then with big data and GPUs in the 1990s, leading to the complex “zoo” of modern AI [10:37:00].
The Phenomenon of Superhuman Models
Modern large language models, like GPT-4 with its estimated 1.3 trillion parameters, operate in a space “way beyond human understanding” [13:00:00]. These are termed “superhuman models” [13:11:00]:
- High-Dimensional Regularities: Complex domains, while seemingly incompressible, possess fundamentally high-dimensional regularities [13:42:00]. Conventional mathematical approaches, which rely on human comprehension and concise expressions, cannot capture these [13:54:00].
- Statistical Uncanny Valley: Contrary to fundamental statistical theory, adding more parameters can lead to better generalization, crossing a “statistical uncanny valley” where initial increases in parameters lead to poor out-of-sample performance [14:32:00].
- Solving Induction: These models have, in some sense, “solved the problem of induction” by demonstrating reliable predictions for complex phenomena despite the inherent contingency of inductive reasoning [14:53:00].
- Gradient Descent in High Dimensions: The effectiveness of gradient descent in these ultra-high dimensions is a “miracle” [16:03:00]. Even non-differential functions can be navigated because in a space of hundreds of thousands of dimensions, there is almost always a dimension pointing downwards for optimization [15:50:00]. This also relates to the local minima problem in adaptive computation, which is ameliorated in high dimensions, prompting a potential comeback for genetic algorithms [17:32:00].
The Challenge of Understanding and Human Learning
Despite their predictive power, current large models face significant limitations and raise questions about understanding:
- Efficiency of Human Learning: Human language acquisition is vastly more efficient than that of LLMs [22:19:00]. LLMs require “hundreds of gigabytes” of training data, compared to a human’s “1.5 megabytes,” a difference of over five orders of magnitude [21:43:00]. This highlights the “poverty of the stimulus” argument by Chomsky [22:28:00], although some argue that constant instruction and feedback in human learning provide a “gigantic” advantage [22:51:00]. Humans and animals possess “very very different” algorithms that leverage small amounts of data into rich understanding [24:47:00].
- Lack of Mechanistic Insight: These models, while powerful, often provide little insight into the mechanisms behind their success [06:00:00]. For example, neural networks struggle to infer simple deterministic rules from patterns like Conway’s Game of Life, even when the rules are known to be minimal [26:08:00].
- Arithmetic Deficiency: Despite their complexity, models like GPT-4 (trillions of parameters) are “worse at doing elementary arithmetic” than a 50-year-old HP-35 calculator with 1K ROM [34:10:00]. This highlights their non-sentient nature and their reliance on pattern matching rather than fundamental logical or mathematical reasoning [35:18:00].
Towards a New Way of Doing Science: Cognitive Synergy and Symbolic Regression
One promising path forward involves cognitive synergy, combining different AI paradigms:
- Integrated Approaches: Bringing together deep learning, genetic algorithms, and symbolic AI (e.g., math machines, provers, solvers) allows for a more holistic approach. This would enable systems to leverage the perceptual power of deep learning with precise mathematical skills and mechanical simulation [35:36:00].
- Neural Nets as Pre-processors for Parsimonious Science: A novel approach involves using neural networks as data pre-processors for generating concise scientific theories.
- Methodology: Train a neural network as a “surrogate for reality,” then sparsify its connections, and apply a genetic algorithm to perform symbolic regression on the reduced network [30:47:00]. This process can “produce formulas” or “algebraic formula [to] encode the regularities” [29:42:00].
- Example: Miles Cranmer’s work in cosmology used graph neural networks to infer Newton’s laws of motion or even derive new parsimonious encodings for dark energy behavior from astronomical data [28:32:00].
- Outcome: This method yields two distinct outputs: the large neural network for prediction and a smaller, specified model for understanding [30:59:00]. This could be a “whole new way of doing science” [31:08:00].
Science, Technology, and Society’s Role
The relationship between science and technology has historically seen periods where mechanics and tinkering led to advancements before formal scientific understanding emerged, as seen with early heat engines and electricity [43:56:00]. Later, science began to lead technology, as exemplified by electronics and GPS [44:23:00].
Today, the rise of big data and powerful artifacts from data could potentially sideline science, as society might prioritize utilitarian prediction over fundamental understanding [44:49:00]. This dynamic, however, is often cyclical. The question arises: if large language models are the “steam engines of the 21st century,” what new science will they inspire? [46:16:00] Potential areas include new cognitive theories, or even a truly working economic theory that moves beyond current models to predict market behavior [46:48:00].
Constructs and Theorizers
The concept of “constructs” or “schema” is crucial here. Complex systems, unlike purely physical ones, “encode reality” and “encode history” within their structure, acting as “simulacra” or mirrors of their environment [48:06:00]. This means that “every element in a complex system is a theorizer of a sort,” holding a “theory of the world” [49:53:00].
Intriguingly, this connects to the nature of scientific theory itself; a theory of gravity or quantum mechanics is also a “propositional schema of reality” [49:27:00]. Thus, deep neural networks, through training, are essentially creating “theories of the phenomenon” and complex systems akin to organisms [50:27:00].
Schema, in this context, must be robust, evolvable, and composable [51:06:00]. Research suggests that trained neural networks can develop a form of “internal compositional encoding,” making their representations more akin to the schema or constructs proposed by earlier thinkers [51:51:00].
Creativity, Discovery, and the Role of Constraint
While large language models can produce creative outputs within existing knowledge (e.g., writing movie scripts, comparing philosophical concepts), they are “pure herd” [54:49:00], excelling at synthesizing established knowledge rather than generating true, paradigm-shifting discoveries [55:00:00]. True creativity often involves stepping outside the “herd” and rejecting conventional wisdom [55:17:00].
Scientific revolutions throughout history have often been characterized by the presence of “bandwidth limitation and constraints,” rather than excess power [59:31:00]. Examples include:
- Tycho Brahe and Kepler: Without massive computing power, Brahe relied on Kepler as his “calculator,” leading to Kepler’s laws [59:51:00].
- Newton and Calculus: The development of calculus (“method of fluxions”) was a human’s way of dealing with complicated data sets under computational constraints [01:00:56:00].
- Darwin: Darwin’s theory of evolution, a “simple” and “astounding idea,” emerged from observation and synthesis without “big data” [01:06:31:00].
- Mendeleev’s Periodic Table: Mendeleev developed the periodic table, predicting unknown elements, without understanding atomic structure, relying on the “quasi-harmonic melodic geometric pattern instinct” [01:16:09:00].
This suggests that human thought and creativity thrive under limitations of memory, inference, and “self-imposed constraints” [01:02:04:00]. A future direction for machine learning might involve “hobbling” models or incrementally decaying them to find minimal effective configurations, potentially leading to new scientific breakthroughs [01:02:42:00].
Meta-Occam and Parsimony
Occam’s Razor, the principle of favoring the simplest explanation, is a heuristic pervasive in physical sciences [01:09:36:00]. However, for complex phenomena, applying Occam’s Razor directly often seems impossible due to their high dimensionality [01:10:48:00].
This leads to the concept of Meta-Occam: There are domains where “the parsimony is in the process not in the final object” [01:11:21:00]. While physics has parsimonious theories of simple objects (like atoms), complexity science seeks “parsimonious theories for generating complicated objects” [01:11:30:00]. Both machine learning (via reinforcement learning) and evolution by natural selection share this “Meta-Occam” property: they rely on simple, minimal processes that can generate arbitrary complexity over time [01:11:50:00].
For instance, the concept of evolutionary time can be measured by the “assembly index,” where physical theory produces low-assembly structures at equilibrium, but life and human technology drastically increase this complexity through compositional processes, a “second law-like argument” against the block universe [01:12:41:00].
This is a fundamental difference: physical theory posits “infinite models for minimal objects” (e.g., multiverse to explain fine-tuning constants) [01:14:00], while complexity science searches for “Meta-Occam processes” — simple rule systems with open-ended properties that produce highly complex objects [01:15:08:00].
Existential Risks and the Future of AI
The discussion around existential risk from AI often involves “overheated speculations” that may serve as marketing ploys [01:17:39:00]. While long-term risks like a “paperclip maximizer” AGI are conceivable, they are “not imminent” [01:17:56:00].
More immediate and concerning risks include:
- Misuse of Narrow AI: The development of advanced police states (e.g., China’s use of AI for surveillance) [01:19:03:00].
- Idiocracy Risk: As humans increasingly delegate cognitive tasks to machines, there’s a risk of intellectual devolution and loss of fundamental skills, making society vulnerable to technological collapse [01:19:34:00].
- Acceleration of Game A: Current AI advancements could accelerate existing societal trajectories (“Game A”) towards catastrophic limits, such as ecological collapse, by making resource extraction and consumption more efficient [01:21:20:00].
Historically, humanity has managed risks associated with powerful technologies:
- Genetic Engineering: The Asilomar conference introduced a self-moratorium on recombinant DNA research in the 1970s, and CRISPR remains largely unregulated but has not led to widespread misuse [01:22:50:00].
- Nuclear Weapons: Controls and non-proliferation treaties were established relatively quickly after the atomic bomb’s invention [01:23:24:00].
- Automobile: A 95% reduction in fatality per mile over 100 years was achieved through incremental regulatory interventions (e.g., traffic lights, seatbelts) [01:23:44:00].
The call is for an “empirically informed discussion” based on historical precedent, rather than “science fiction prognostication” [01:24:36:00]. While AI’s rapid advancement (e.g., GPT-5 trained on video inducing physics) presents a different challenge due to lower costs and easier proliferation [01:25:07:00], the “thinking vs simulated thinking” distinction is vital as current models are feed-forward and non-sentient [01:27:51:00].
One potential positive development is the “natural evolutionary reaction” to the “flood of sludge” (spam, fake news) on the internet: the development of info agents [01:28:48:00]. These AI agents, acting as personalized “spam filters,” would curate information, allowing users to buffer themselves from overwhelming digital noise and connect to trusted curators, leading to a “magnificent opportunity” to use AI for information hygiene [01:28:53:00]. While some might counter this by simply ignoring technology, the development of adaptive strategies is a testament to humanity’s capacity to adapt to new, even dangerous, technologies, just as it learned to control fire [01:33:51:00].