AI and language models

From: jimruttshow8596

The discussion distinguishes between theory-driven and data-driven science, noting a bifurcation in scientific approaches where fine-grained paradigms emphasize prediction through large models, while coarse-grained paradigms prioritize understanding [00:02:48]. Historically, physical science was a lucky conjunction of these two, where fundamental theory proved useful [00:03:05]. However, there’s a potential shift towards needing two classes of science for the same topic: one for understanding and one for practical use [00:03:31].

The Rise of Data-Driven AI

Modern AI, particularly language models, has achieved remarkable success in areas where traditional theoretical approaches failed [00:04:00]. Examples include AlphaFold’s ability to perform protein folding, which provided significant practical value without offering theoretical insight [00:04:04]. Similarly, Transformer Technologies, combined with brute force data and computation, have produced unbelievably powerful language models that initially provide little insight into their mechanisms [00:05:38].

Historical Context of Neural Networks

The origins of induction in science can be traced to Hume in the 18th century [00:06:41]. Interestingly, the development of statistics, a mathematical technique for reducing error in measurements, emerged from the deductive empirical sciences like classical physics [00:07:02].

The history of neural networks is not, as commonly perceived, rooted in induction or large data sets [00:08:06]. Early neural networks, such as those developed by Warren McCulloch and Walter Pitts in 1943, were primarily deductive frameworks focused on logic and propositional reasoning within the brain [00:09:56]. The “AI winter” of the 1960s was partly due to criticisms regarding the mathematical capabilities of perceptrons, particularly their inability to handle non-linear separability like the XOR function, and the perceived impossibility of training “deep” neural networks due to a vast number of parameters [00:12:02]. The breakthrough with GPUs and vast computational resources in the late 2000s allowed for the training of models with trillions of parameters, like GPT-4, which was previously unimaginable [00:12:47].

Superhuman Models and Incompressibility

The success of modern neural networks, despite their complexity, suggests that regularities exist even in high-dimensional domains [00:13:45]. These “superhuman models” appear to have “solved the problem of induction” by performing well even when adding parameters beyond the statistical “uncanny valley” [00:14:32]. This means they can generalize effectively from data, even though their internal workings are so complex they are “Way Beyond human understanding” [00:13:07] and “unstructured” [00:28:08].

These models are adept at handling “incompressible representations,” which characterize complex systems [00:41:30]. Unlike simple physical systems, complex systems encode history and often involve “elaborate rule systems and broken symmetries and evolving initial conditions” [00:42:04].

Limitations and Differences from Human Learning

Despite their power, large language models have significant limitations. For instance, GPT-4, with trillions of parameters and millions of dollars in training costs, struggles with basic arithmetic that a 1970s HP-35 calculator (with 1KB ROM) could easily perform [00:34:10]. This highlights that they are not “truly intelligent systems” in a general sense [00:37:45].

A critical difference between AI and human learning is efficiency. Human language acquisition requires a memory footprint of about 1.5 megabytes, whereas large language models require hundreds of gigabytes of training data, a scale difference of five orders of magnitude [00:21:40]. This relates to Chomsky’s “Poverty of the stimulus” argument for innate language abilities, though feedback during human learning, like a grimace for grammatical errors, provides significant implicit instruction not always present in large datasets [00:22:28]. Humans and animals possess algorithms fundamentally different from deep learning that allow them to learn from much smaller levels of examples [00:23:47].

Modern language models are “pure herd” [00:54:49], excellent as reference material for established knowledge, but not inherently “Discovery engines” [00:55:03]. They excel at compositionality and generating creative text within human norms, performing as well as a paid Hollywood screenwriter for a first draft with significantly less labor [00:55:46]. They are proficient as “quick and dirty analysts and synthesizers,” capable of comparing complex concepts like Marxism, capitalism, and Game B [00:57:20]. However, they lack “geometry” and the ability to “visualize the world geometrically,” which is crucial for breakthroughs like Einstein’s relativity [00:59:05].

Towards a New Science and AI Architectures

The complexity of human language, particularly “human language in the wild,” proved difficult for traditional computational linguistics methods, which relied on “rules upon rules” [00:05:13]. However, the immense data generated by Transformer Technologies allows for new inductive and abductive approaches to analysis [00:06:09].

The development of “symbolic regression” combined with neural networks offers a new way of doing science [00:30:52]. For example, researchers use graph neural networks to encode particle interactions and then apply genetic algorithms for symbolic regression to “produce formulas” that encode regularities, leading to the discovery of new parsimonious encodings for phenomena like dark energy [00:28:57]. This approach separates the prediction function (performed by the large neural net) from the understanding function (derived from the sparsified model) [00:30:59].

The concept of “cognitive synergy” suggests that combining deep learning with other AI architectures and projects like genetic algorithms, symbolic AI, math machines, and solvers can address the limitations of current models [00:35:36]. This hybrid approach allows for the integration of perceptual power with deep mathematical skills [00:36:21].

Meta-Occam’s Razor

Traditional science, rooted in Occam’s Razor, seeks the simplest explanation for a phenomenon [01:09:57]. However, in complex domains, this doesn’t always apply [01:10:50]. The concept of “Meta Occam” proposes that parsimony lies in the process rather than the final object [01:11:19]. Machine learning and natural selection share this “meta-Occam” property, as they involve simple processes (like reinforcement learning) that can generate arbitrarily complicated objects [01:11:36]. Complexity science, therefore, is the search for these “meta-Occam processes” – relatively simple rule systems with open-ended properties [01:15:08].

AI as a Self-Leveraging Accelerator Posing Existential Risks

While concerns about existential risks from AI are often “undisciplined” [01:17:31], significant risks do exist, though perhaps not immediately “imminent” [01:18:02]. Current overheated speculations may serve as marketing for more resources for long-term problems [01:18:46].

Other risks include:

Misuse of narrow AI: Like China’s development of a police state using AI for facial recognition and tracking [01:19:03].
Idiocracy risk: If humans delegate too many intellectual skills to machines, it could lead to a societal devolution where essential skills are lost [01:19:34].
Acceleration of “Game A”: Current AI technologies could accelerate existing unsustainable societal patterns (“Game A”), potentially leading to environmental collapse faster [01:21:23].

Historically, humanity has managed risks from new technologies (e.g., recombinant DNA, nuclear weapons, automobiles) through “small regulatory interventions” and learning to control them incrementally [01:22:21]. The rapid pace of AI development, however, presents a unique challenge, with costs for building models decreasing and technologies escaping confinement [01:26:37].

Despite the negatives like the “flood of sludge” (exponential increase in low-quality content and spam) [01:28:11], AI also offers opportunities. The development of “info agents” could buffer users from digital noise, filter information, and enable mutual curation networks, transforming how individuals interact with the online world [01:28:50]. This represents a potential “magnificent opportunity” to use current technologies to improve information hygiene [01:30:36]. This approach leverages AI and human coexistence for a more curated information environment.

Tubegraph

Explorer

Table of Contents