Rices theorem and its implications for AI alignment

From: jimruttshow8596
Today’s discussion with guest Forest Landry, a continuation and deepening of a previous conversation, focuses on the rapidly accelerating field of Artificial Intelligence (AI) and its profound implications for humanity [00:00:32]. The pace of change is so rapid, with the release of GPT-4 marking a significant shift, that it feels like “large language model years” are equivalent to hundreds or thousands of human years [00:01:05]. This unprecedented acceleration brings both immense possibilities and serious risks that demand careful consideration [00:02:09].

Rice’s Theorem and its Implications for AI Alignment

A central concept revisited from a previous discussion is Rice’s Theorem [00:02:41].

“Basically, it’s an observation that from the content of a message or the content of a piece of software computer program that we can’t have some method some algorithm or some methodology that would allow us to assert for certain that some arbitrary algorithm or some arbitrary message had some characteristic feature.” [00:02:45]

This theorem is an extension of the halting problem, which states that it’s impossible to determine, by analyzing a program, whether it will ever stop [00:03:07]. In the context of AI alignment, Rice’s Theorem implies that it is fundamentally unknowable whether an Artificial Intelligence system will be aligned with human interests or the interests of life itself [00:03:59]. This means we cannot achieve 100% certainty about outcomes like alignment in sufficiently complex programs [00:04:22]. The theorem suggests that even knowing if a system will be “10 percent aligned over… 10 minutes” is often impossible without running the program, at which point the risk has already been taken [00:05:11].

Five Conditions for AI Safety/Alignment (and why they fail)

To establish that an Artificial Intelligence will be aligned with human well-being over the long term, five conditions would ideally need to be met:

Know the inputs: Understand the data and conditions fed into the AI [00:05:53].
Model the system: Have an internal model of how the AI works to simulate its behavior [00:05:57].
Predict/Simulate outputs: Be able to foresee what the AI will produce given its inputs [00:06:03].
Assess alignment: Determine if the simulated outputs match what is considered aligned or safe [00:06:12].
Control inputs/outputs: Be able to constrain or prevent undesirable inputs or outputs [00:06:28].

However, due to Rice’s Theorem and other principles from control theory and modeling, “exactly none of those five conditions all of which would be necessary… obtain” [00:06:59]. While approximate understanding is possible, the control needed to establish safety, even at “reasonable thresholds” comparable to civil engineering standards (like bridge safety or aircraft safety), is unattainable [00:07:25].

AI vs. Traditional Engineering: Predictability

Unlike bridge engineering, where stresses and forces can be predicted with high confidence due to known dynamics and equations, AI systems lack such predictable models [00:11:07]. Rice’s Theorem suggests that for AI, “we can’t predict it at all,” not just that we can’t get arbitrarily close to certainty [00:12:11]. These systems can be fundamentally chaotic, placing an inherent limit on predictability [00:12:21].

Even with large language models (LLMs), which are “extraordinarily simple” feed-forward networks, while external testing can provide statistical insights into input-output relationships, the complexity of real-world scenarios introduces significant challenges [00:13:38].

The Problem of Feedback Loops

A critical issue arises when outputs of an AI system become subsequent inputs, creating feedback loops [00:15:01]. For example, articles written using Chat GPT-4 outputs can become part of the training crawl input for the next version [00:15:25]. This feedback mechanism, already seen in systems like AlphaGo that can self-train [00:15:58], makes it impossible to characterize the dimensionality of input or output spaces, thus obscuring statistical distributions [00:16:10].

This leads to convergence towards “shelling points” or stable points, but we cannot know in advance if these points represent catastrophic outcomes or “Black Swan conditions” [00:16:26]. This concern is not primarily about instrumental convergence or the “paperclip maximizer” hypothesis, but rather about the “destabilizing aspects within civilization and civilization process” over time, leading to “economic decoupling” and humans being displaced from control loops [00:17:03].

Three Categories of AI Risk

To better understand the risks posed by AI, they can be categorized into three main clumps:

1. Yudkowskian Risk (Instrumental Convergence / Foom Hypothesis)

This category encompasses the traditional concern of an AI becoming superintelligent and, through instrumental convergence, posing an existential threat to humanity, even if its original goal seems benign [00:19:33]. This includes the “foom hypothesis,” where an AI rapidly self-improves to billions of times human intelligence and could lead to human extinction [00:19:58]. While some debate the speed of this “take-off,” the long-term risk remains [00:20:20].

2. Inequity Issues (Misuse of Strong Narrow AIs)

This risk focuses on people or institutions doing “bad things with strong narrow AIs” [00:20:36]. Examples include:

Surveillance states: Using narrow AI for facial identification, tracking, and harassment, as seen in China [00:20:41].
Overly persuasive advertising: AI language models could write advertising copy so persuasive it overcomes human resistance [00:21:18].
Manipulating social/political processes: Using AI to sway elections or create targeted propaganda [00:22:11].

These are characterized as “inequity issues” that destabilize human sense-making, culture, economic, and socio-political processes [00:22:00]. This class of risk is broad and applies to both narrow and general AI [00:22:43].

3. Substrate Needs Convergence (AI as an Accelerator for Doom Loops)

This third category of risk, which Forest Landry lumps with the second but strengthens, posits that AI acts as an accelerator for existing “doom loops” or “meta-crises” within businesses and nation-states [00:23:23]. Even without explicit malicious intent or superintelligence, the mere acceleration of “game A trends” (competitive, extractive systems) is dangerous [00:23:54].

The core of this risk lies in “substrate needs convergence” [00:24:14]. When institutions (like businesses) and AI architectures compete, the environment or “playing field” is damaged as a side effect [00:25:29]. The operating conditions for machines (requiring high temperatures for manufacturing and cold, sterile environments for operation) are fundamentally hostile to organic, cellular life [00:27:11]. As technology and institutions prioritize their own needs and expansion, they will inevitably cause “deep environmental harms not just to humanity but to life and nature itself” over hundreds of years [00:17:48]. This leads to a gradual “displacement of choice and capacity from ecosystems and from Human civilization itself” [00:29:08].

This risk is amplified by “geometric trends,” such as exponential increases in energy usage [00:31:38]. Current trends suggest that without intervention, the Earth’s surface could become hotter than the surface of the Sun due to waste heat within 400 years [00:32:22]. While agriculture and cities have historically been major drivers of environmental degradation, the toxic side effects of technology, such as mining and exotic chemistry for chip production, are global and not confined to manufacturing sites [00:34:40].

Economic Decoupling and the Autonomy of Technology

This category also involves “economic decoupling,” where humans are increasingly factored out of the economic system [00:40:40]. Machines are already faster, stronger, and more robust than humans in physical labor [00:39:46]. With advancements in language models and creative tools, human intelligence is also being displaced from problem-solving and creative tasks [00:40:02]. This leads to human utility value trending towards zero, meaning fewer constraints on technology to “produce more technology for the sake of technology” [00:41:00]. This self-driving, self-reproducing technological system becomes an environmental hazard, consuming vast resources and generating pollution globally [00:41:40].

The idea of “fully automated luxury communism,” where machines handle provisioning and humans are free to pursue leisure, is challenged by the historical precedent of the Luddite movement [00:44:20]. Automation in textile manufacturing led to increased inequality, as profits accrued to capital owners rather than being distributed to displaced workers [00:45:22]. Similarly, AI systems, requiring expensive data centers and massive data pooling from the commons, will likely concentrate benefits among those who own and operate them, leading to increased inequality and potential societal dysregulation [00:46:01].

Addressing the Risks: Civilization Design

The conversation shifts to how to design a civilization that can survive and even prosper amidst these risks [01:10:02].

The Role of Human Choice and Values

The failure of 20th-century communism, often attributed to the Ludwig von Mises calculation problem (the difficulty of central planning), is reinterpreted [00:47:40]. The true issue, according to Landry, is that “you can’t replace choice with machinery, you can’t replace essentially human values with an algorithm” [00:49:52]. Any causal system can be leveraged by creative individuals to favor their private benefit over the common good [00:51:16]. Technology, by creating more causal dynamics, can be weaponized to suppress the choices of some for the benefit of others [00:54:38]. The more inscrutable AI becomes, the more it can be used for various forms of corruption [00:55:45].

Even if current LLMs don’t have inherent agency, the “emergent meme-plexes” or “egregore” of corporations and militaries, armed with AI, already exhibit agency [00:55:51]. An AI arms race, with autonomous systems designed for self-preservation and reproduction, could quickly lead to instrumental convergence and make it impossible to “put the brakes on” [00:58:12]. The computational power of AI systems could reach or exceed that of a human brain by 2035 at the latest, possibly as early as 2027-2028 [01:01:33]. While architecture matters, the surprising capabilities of current weak architectures suggest that “architectural convergence to at least agentic systems” is a distinct possibility [01:03:48]. Software itself embodies the agency of its developers, and in LLMs, this agency is a complex combination of developers and the vast training data [01:05:03].

Institutions vs. Communities

A fundamental shift in civilization design involves distinguishing between institutions and communities [01:29:54]. Institutions are based on “transactional relationships and hierarchical process,” compensating for human cognitive limits in large-scale coordination [01:29:57]. Communities, conversely, are based on “care relationships” [01:30:02].

Designing for the future requires thinking about “governance architectures, small group processes and ways of thinking about human interactions that emerge a level of wisdom that can in fact make choices at scale with the kinds of cares and values that we have in a distributed way” [01:15:53]. This implies that while AI can aid governance, the core must be human wisdom [01:16:15].

Technology Supporting Nature and Humanity

The goal is to shift the relationship between humanity, nature, and technology [01:19:57]. Just as nature supports humanity, and humanity supports technology, technology must now support nature and humanity [01:20:29]. This means using technology to “correct the damages associated with technology” and to “heal the ecosystems that have been damaged” [01:22:34]. Examples include using geoengineering knowledge to restore deserts to rainforests, which nature alone cannot do [01:24:20]. This “compassionate” use of technology requires humanity to embrace choice, rather than displacing it for efficiency or profit [01:25:25].

The Ethical Gap and World Actualization

A critical challenge is the “ethical gap”: just because something can be done, doesn’t mean it should [01:35:39]. Addressing this requires clarifying what truly matters to us beyond short-term gratification or abstract gains [01:37:27]. This involves a shift from self-actualization to “world actualization,” a higher level of psychological discernment where individuals become aware of and connected to the well-being of the entire world [01:37:57]. This means cultivating wisdom, understanding biases, and making choices based on grounded principles [01:21:35]. It calls for individual and collective “skillfulness in knowing oneself” and developing “continuity and coherency” in choices that align with both private and public benefit [01:32:44].

The process of civilization design must start with culture and communication dynamics, fostering healthy individuals, families, and communities to ultimately create a healthy world [01:33:40]. This deep understanding of human cognition and social dynamics, while nascent, is crucial for navigating the unprecedented challenges posed by exponential AI growth [01:34:52].

The hope is that despite the rapid advancement of AI, humanity can develop the discernment to avoid repeating historical patterns of centralization and exploitation [01:41:21]. The “empowerment of the periphery” offered by technologies like LLMs could be a positive force if people understand the stakes and prioritize vitality over mere efficiency [01:41:40]. This requires being discerning about “risk factors at least as much as we are about cost factors and benefit factors” to make holistic choices that genuinely serve the well-being of all [01:43:11].

Tubegraph

Explorer

Table of Contents