From: jimruttshow8596

The discussion features David Krakauer, President and William H. Miller Professor of complex systems at the Santa Fe Institute (SFI), exploring the evolving relationship between complexity science and machine learning. This conversation expands on themes previously discussed in Jim Rutt’s Episode 10 with Krakauer, which broadly covered complexity science [01:01:31].

Theory-Driven vs. Data-Driven Science

The conversation begins by examining the distinction between theory-driven and data-driven science [01:59:02]. While traditionally, science often starts with data collection [02:11:42], Krakauer suggests that current advancements indicate a “bifurcation” in scientific methodology, not necessarily between theory and data, but between paradigms of prediction and understanding [02:41:42].

Historically, physical science benefited from a “lucky conjunction” where fundamental theory was also very useful for practical applications [03:09:02]. However, current trends suggest a future where two classes of science might exist for any given topic: one for understanding and another for utility [03:36:02].

The Rise of Data-Driven Success

Jim Rutt highlights examples where theory failed, but data-driven approaches triumphed [04:00:02]:

  • AlphaFold: This system achieved unprecedented success in protein folding, a long-standing “Holy Grail” problem in computational chemistry, without providing direct theoretical insight into the mechanisms [05:01:02].
  • Large Language Models (LLMs): Traditional computational linguistics, despite decades of effort, struggled to understand language’s “marginally lawful nature” [05:32:02]. Simple-minded technologies like Transformer networks, combined with brute-force data and computation, have yielded unbelievably powerful language models that initially offered little mechanistic insight [06:03:02].

These powerful models, however, generate vast examples that can then be subjected to induction and abduction, opening new frontiers for understanding [06:10:02].

Historical Context of Neural Networks

Neural networks, contrary to popular perception, did not originate with induction. Their history traces back to the 1940s as deductive frameworks related to logic [10:19:02].

  • David Hume and Induction: The concept of induction, as known today, emerged from David Hume in the 18th century, who believed humans and their world were not rational in the sense of being understandable through deduction [06:41:02].
  • Origins of Statistics: Statistics developed as a mathematical technique to reduce measurement error in celestial mechanics, an empirical science [07:02:02]. This inductive approach was initially about parameter estimation [07:55:02].
  • McCulloch and Pitts (1943): The foundational paper on neural networks by Warren McCulloch (neurophysiologist) and Walter Pitts (mathematician) was a “weird conjunction” of Boolean logic and Principia Mathematica. Their work aimed to understand how a brain might reason propositionally, emphasizing deduction and logic [09:56:02].
  • AI Winters: The initial “AI winter” for neural networks stemmed from criticisms by Minsky and Papert in 1969, who highlighted their mathematical limitations (e.g., non-linear separability of XOR) and the perceived impossibility of training “deep” networks with too many parameters [12:02:02].

The fusion of statistical parameter estimation and these deductive neural network concepts, combined with “big data” and GPUs in the 1990s, led to the current “complicated zoo” of technical phenomena [10:50:02].

Superhuman Models and High Dimensionality

LLMs like GPT-4, with parameters in the trillions [13:00:02], operate in a space “Way Beyond human understanding” [13:05:02]. Krakauer refers to these as “superhuman models” [14:49:02].

  • Incompressible Regularities: Complex domains, such as human behavior, have regularities that are fundamentally high-dimensional and incompressible [13:42:02]. Conventional mathematical approaches, which rely on comprehension and concise expression (e.g., “back of an envelope”), cannot handle trillions of parameters [14:00:02].
  • Statistical Uncanny Valley: Contrary to fundamental statistical theory, models can perform poorly with increasing parameters up to a point (the “uncanny valley”), but if parameters continue to be added, they can perform well again [14:35:02]. These “superhuman models” appear to have “solved the problem of induction” that Hume identified, by being able to generalize even in the face of contingencies [14:53:02].
  • Gradient Descent in High Dimensions: The effectiveness of backpropagation and gradient descent in large models is partly due to the “miracle of ultra-high dimensionality.” In hundreds of thousands of dimensions, there’s always a direction (gradient) pointing downwards, making navigation possible even with non-differential functions [16:03:02].
  • Agent-Based Modeling: Historically, SFI’s view on agent-based modeling favored parsimony (e.g., 2-3 parameters at most), deeming models with many parameters as “garbage in garbage out” [16:31:02]. The success of deep learning in complex domains suggests a shift where highly concise models may not always be the optimal approach for prediction [16:47:02].

Complexity Theory and Its Critiques

Complexity theory and deep learning both offer ways to handle incompressible representations [41:21]. While physical systems often have simple rules and symmetries, complex systems are characterized by elaborate rules, broken symmetries, and evolving initial conditions [42:01].

Traditionally, complexity science at SFI has tackled complexity by “taking averages” or focusing on bulk properties, like Jeffrey West’s work on scaling [42:22]. However, this can sometimes “throw the baby away with the bathwater” [42:41]. Neural networks, used “in a kind of mercenary way,” allow for less draconian averages [42:46].

A New Way of Doing Science: Neural Nets as Pre-processors

The ability to process vast amounts of data using neural networks suggests a new scientific methodology:

  1. Train a Neural Net: Use it as a “surrogate for reality” to capture complex patterns [30:48].
  2. Sparsify and Quantize: Reduce the complexity of the trained model, for instance, by removing edges in graph neural nets [29:20].
  3. Symbolic Regression: Apply genetic algorithms to the sparsified model to produce “algebraic formula” or “equations of motion” that encode the regularities [29:38].

This approach allows for maintaining the neural network for prediction while using the simplified model for understanding [30:59]. Miles Cranmer’s work in cosmology, where neural nets are used to infer “Newton’s law for Dark Energy” from astronomical data, is a key example [28:32].

This method can be seen as an advanced form of techniques like principal components analysis, where underlying “lower-dimensional manifolds” are identified to build mechanistic causal theories from complex data [31:21].

Constructs/Schemas in Complex Systems

Complex systems, unlike purely physical ones, encode history and reality [48:06]. A rock contains nothing about its hill, but a microbe or brain contains “simulacra” or “mirrors of reality,” encoding billions of years of adaptive history [48:22].

Complex systems perform “parsimonious encodings” of reality, similar to how scientific theories themselves are schemas of reality [49:23]. This creates a deep correspondence: a complex system can be seen as a “theorizer” of the world, much like a scientific theory [49:53]. Training a deep neural network is, in essence, creating a “theory of the phenomenon” or a “rule system” – it is creating a complex system or an “organism” [50:27].

These internal schemas must be robust, evolvable, and extensible. Research suggests that trained neural networks can develop “internal compositional encoding” within their representations, akin to the schemas and constructs proposed by earlier thinkers [51:20].

Limitations of Deep Learning

Despite their capabilities, deep learning models possess significant limitations:

  • Data Efficiency: LLMs require vastly more data (e.g., five orders of magnitude more) than humans to achieve similar results in tasks like language learning [22:09:02]. Human and animal learning algorithms are far more efficient at leveraging small amounts of examples [23:47:02].
  • Arithmetic Incompetence: Despite trillions of parameters and millions of dollars in training, GPT-4 struggles with basic arithmetic, performing worse than a 50-year-old calculator with 1KB of memory [34:10:02]. This highlights their non-sentient nature [35:10:02] and their inability to internalize functions like humans can [37:51:02].
  • Difficulty with Simple Rules: When given patterns generated by simple deterministic rules (like Conway’s Game of Life or the Mandelbrot set), convolutional neural networks perform terribly at inferring those underlying rules [26:37:02]. This suggests they “encode maximally” where humans would encode minimally [27:19:02].

Cognitive Synergy and Heuristic Induction

A promising path forward involves “cognitive synergy,” combining different AI methods like deep learning, genetic algorithms, symbolic AI, math machines, and solvers [35:36:02]. This allows for leveraging the strengths of deep learning (e.g., perceptual power) alongside precise mathematical and simulation capabilities [36:21:02].

Another “express highway to AGI” is “heuristic induction,” the ability to amazingly find and apply heuristics, often unconsciously [38:28:02]. Deep learning is currently not good at explicit creation of heuristics [39:04:02].

Creativity and Discovery

While LLMs may not be “creative” compared to the next Einstein, they demonstrate significant creativity in compositionality for typical human tasks [55:42]. They can generate clever plot ideas, interesting dialogue, and actions not done before [55:46]. They are “quick and dirty analysts and synthesizers” [57:20], capable of performing analysis and comparison tasks very effectively, but they are “terrible at libraries” and prone to fabrication [56:51].

However, they are unlikely to achieve breakthroughs like Einstein’s visualization of relativity, as they lack geometry or physical simulation capabilities [59:03]. True scientific revolutions, particularly those involving “bottlenecking of the phenomenon into increasingly simple sets of causal relationships and constraints,” may still require human-like abstraction [01:00:22].

The Role of Constraints in Discovery

Scientific revolutions and creative breakthroughs often arise from “bandwidth limitation and constraints” rather than excess power [59:31].

  • Tycho Brahe and Kepler: If Tycho Brahe had massive computing power, he might never have hired Kepler as his “calculator,” thereby missing Kepler’s laws [59:52].
  • Calculus: The development of calculus was a human way of dealing with complicated data sets [01:01:00].
  • Western Music: The diatonic system, despite its mathematical imperfections, is a set of “weird contrived constraints” that has enabled vast musical creativity [01:01:40].

Human thought and creativity are deeply based on the limitations of memory, inference, and self-imposed constraints [01:02:04]. This raises the question of whether adding more computational power will lead to true scientific revolutions or if “hobbling” machines (incrementally decaying them) could reveal insights when they “work just barely good enough” [01:02:33].

Occam’s Razor and Meta-Occam

Occam’s Razor is a heuristic (not a law) that states one should prefer the simpler explanation for a phenomenon, taking the theory with the fewest parameters [01:09:57]. Physical sciences often apply this to produce “beautiful simple things” like Dirac’s theory of relativistic quantum mechanics [01:10:27].

However, in the complex world, which is high-dimensional and irreducible, Occam’s Razor doesn’t seem to apply to the final objects [01:10:43]. This leads to the concept of Meta-Occam:

  • Parsimony in Process, Not Object: For complex domains, the parsimony is in the process that generates the object, not the object itself [01:11:21].
  • Evolution by Natural Selection: Darwin’s theory, for instance, is highly parsimonious (explainable in a few sentences) for generating arbitrarily complicated objects like worms or elephants [01:11:00].
  • Machine Learning and Evolution: Machine learning (e.g., reinforcement learning) and evolution by natural selection share this “Meta-Occam” property of selective feedback, which is mathematically equivalent [01:11:42].

Therefore, complexity science is the search for “Meta-Occam processes” – relatively simple rule systems with open-ended properties that can produce arbitrary complexity over time [01:15:08].

Cyclical Nature of Science and Technology

The relationship between science and technology is cyclical [01:04:51]. Early mechanics were ahead of scientists in the heat engine world (e.g., Watt before Carnot), and Edison advanced electricity without deep theoretical understanding [01:43:56]. Later, science led technology (e.g., microelectronics, GPS required prior scientific understanding) [01:44:23].

The rise of “big data and useful artifacts” produced by current AI could potentially sideline science again, as the technological instantiation becomes primary [01:44:47]. The question becomes, if LLMs are the “steam engines of the 21st century,” what will be their “statistical mechanics” [01:46:16]? This new science could lead to breakthroughs like a real, working economic theory not based on “neo-economic BS” [01:47:19].

Future Outlook and Risks

While some express “existential risk” hand-wringing about AI, Rutt views much of it as marketing [01:18:46]. He highlights other, more pressing risks:

  • Misuse of Narrow AI: Examples include surveillance states (e.g., China’s use of AI for tracking individuals) [01:19:03].
  • Idiocracy Risk: As AI becomes more capable, humans may delegate too many intellectual skills, leading to a devolution of human capacity [01:19:32].
  • Accelerating “Game A”: AI could accelerate current societal trajectory (“Game A”) towards ecological and social cliffs by making resource extraction and manufacturing more efficient, potentially reducing the time available to address fundamental problems [01:21:20].

Historically, humanity has managed risks from new technologies (e.g., genetic engineering, nuclear weapons, automobiles) through incremental regulation and learning [01:22:21]. The counter-argument for AI is its unprecedented speed of development and low cost, which makes it qualitatively different from past technologies [01:25:03].

One potential positive development is the “natural evolutionary reaction” to the “flood of sludge” (AI-generated misinformation and spam): the rise of “info agents” that filter and curate information for individuals, creating “constructive networks of mutual curation” [01:28:48]. Krakauer, however, offers a “Paleolithic alternative”: simplifying radically and ignoring much of the digital noise, emphasizing the value of human judgment and direct engagement [01:32:24].

Ultimately, humans have a track record of adapting to dangerous technologies, from fire to language [01:33:40], and the development of digital hygiene practices among younger generations suggests that adaptation will continue [01:33:51].