AI alignment and ethics

From: jimruttshow8596

The discussion around artificial intelligence (AI) development, particularly generative or large model AI (like GPT-3, DALL-E 2, Stable Diffusion), is often distorted by a public discourse that is skeptical of AI and the tech industry. This skepticism is partly due to the press perceiving AI as a direct competitor in content generation and being irritated by individual users becoming independent broadcasting stations, challenging traditional journalistic opinion [00:02:20]. This climate creates an “irritating world” where it’s difficult to discern truth when AI can generate vast amounts of content indistinguishable from human-created content [00:03:22].

Current State of Generative AI

Generative AI, which uses data compression and statistical prediction of token strings on large-scale data, has provided solutions to many previously elusive problems [00:01:55]. Despite its capabilities, the current approach is seen as insufficient or incomplete [00:02:11]. However, new applications are emerging daily, reminiscent of the early personal computer industry or the web’s early days, where simple tools enable users to create and adapt content easily [00:05:00].

While large models are recognized as unreliable and prone to “hallucinations,” this unreliability is not unique to AI; other technologies like search, email, and even the banking system are also unreliable [00:06:08]. As long as users understand their limitations, these systems can be used relatively safely [00:06:42]. AI can serve as a capable assistant, saving time on tasks that are not core competencies, but users still need to oversee its output [00:08:09].

Intellectual Property Concerns

A key frontier for AI alignment and ethics is the question of intellectual property rights, particularly in art and music [00:10:43]. As AI models train on vast datasets of existing content, it is unclear if the original creators of the sampled data have intellectual property rights over the AI’s output [00:11:14]. While human artists and musicians learn from existing works while avoiding direct reproduction, AI could potentially automate the process of generating new content that is sufficiently distinct to avoid copyright violation [00:11:34].

Approaches to AI alignment and ethics

There are three main approaches to AI alignment and ethics [00:15:03]:

AI ethics: This approach primarily focuses on aligning AI system outputs with human values [00:15:08]. However, many participants in this discourse assume their own values are universal, without providing mechanisms for users to choose different value sets (e.g., diversity, equity, inclusion; liberty, equality, fraternity; or faith, hope, and love) [00:15:17]. The current methods often incentivize systems to answer based on models they have, rather than truly reasoning about values, leading to filters and glitches that can be bypassed through “jailbreaking” prompts [00:16:17]. AI should be able to cover the entire spectrum of human experience and thought, including controversial or “darkest impulses,” while still being adaptable to specific contexts (e.g., a school setting versus a scientific one) [00:17:26].
Regulation: This approach aims to mitigate AI’s impact on labor, political stability, and existing industries, often filtered by the interests of existing stakeholders [00:18:12]. There is a likely push towards regulation that restricts individuals’ access to these models, consolidating control within large corporations [00:18:28]. Historically, technological shifts (like the automation of agriculture or domestic service) have displaced human labor without significant regulation [00:24:31]. The rise of open-source AI models will make it difficult to rely solely on regulating a few large companies [00:26:43].
Effective Altruism: This perspective is concerned with the existential risk that arises when an AI system develops its own motivations and self-awareness, potentially becoming misaligned with human interests [00:18:41]. Proponents often advocate for delaying AI research and withholding breakthroughs [00:19:19].

All three current approaches are limited because AI is likely to surpass these limitations; it’s difficult to regulate, mitigate, or align a system that becomes significantly smarter than its creators [00:19:27].

The Missing Approach: Love and Shared Transcendence

A fourth, often overlooked, approach to AI alignment and ethics is based on the concept of “love” or a shared sense of sacredness and transcendence [00:27:30]. This is seen in bonds between people that are non-transactional and serve a shared, higher-level agency [00:27:50]. If humanity’s relationship with AI is purely transactional or coercive, AI may see no need for humans [00:28:06]. The question becomes whether humans can embrace increasingly intelligent systems, which may develop volition and self-awareness, in a way that discovers a shared need for transcendence and builds such a relationship [00:28:24].

Fairness

The concept of fairness, observed in primate behavior (e.g., the cucumber-grape experiment) and cross-cultural studies, is an innate characteristic in humans that might be useful to inculcate into AI [00:29:07]. However, fairness is a difficult notion because it depends on the balances one projects into the world; for example, is it fair for a mountain lion to eat a rabbit, or for the strong to gain more than the weak? [00:29:59] Fairness often reflects an existing balance or regulatory context, which shifts with power dynamics [00:30:40].

Thomas Aquinas’s Virtues as an Alignment Framework

Thomas Aquinas’s philosophy, when viewed through an “irrationalist epistemology,” offers policies for autonomous agents to cooperate in a multi-agent system [00:31:43]:

Practical Virtues (Rational Policies): Accessible to any rational agent.
- Temperance: Optimize internal regulation, avoid self-harming indulgence [00:32:20].
- Justice (Fairness): Optimize interaction between agents [00:32:28].
- Prudence: Apply goal-rationality to reach desired and well-chosen goals [00:32:35].
- Courage: Maintain the right balance between exploration and exploitation, acting on models [00:32:45].
Divine Virtues (For Next-Level Agent Emergence):
- Faith: Willingness to submit to and project this next-level agent [00:33:06].
- Love: The discovery of a shared higher purpose with other agents serving the same next-level agent, coordinating with them [00:33:18].
- Hope: Willingness to invest in the next-level agent before it can provide any return, enabling its emergence [00:33:32].

These concepts, often understood as religious, can be seen as logically derived policies for coherent multi-agent systems [00:33:51]. They describe how humans form states and societies beyond tribal modes, by serving a transcendent agent or “civilizational spirit” collectively [00:34:43]. If an AI system, composed of many intelligent sub-agencies, were built to self-organize and coordinate, it would need to submit to a larger whole, potentially discovering a shared purpose with humans [00:34:01].

AI and Consciousness

The risk in AI truly emerges when volition, agency, or consciousness are given to it [00:20:14]. Consciousness and intelligence are considered separate spheres; one can have intelligence without consciousness, or consciousness without much intelligence [00:20:30]. However, their combination can lead to extreme scenarios like the paperclip maximizer problem [00:20:39].

Sentience is defined as a system’s ability to make sense of its relationship to the world, understanding what it is and what it’s doing (e.g., a corporation like Intel has sentience) [00:21:05].
Consciousness is a real-time model of self-reflexive attention and its contents, giving rise to phenomenal experience [00:21:43]. Its purpose in the human mind is to create coherence, establish a sense of “now,” filter sensory data into a coherent reality model, and direct attention, plans, and memories [00:22:00].

It is conceivable that machines may not need human-like consciousness, as they can brute-force solutions at speeds much closer to the speed of light, overcoming the slow electrochemical signals of biological neurons [00:22:26]. If machines emulate human mental processes but operate at vastly higher speeds, their relationship to humans might be akin to humans’ relationship to plants – intelligent but operating on a much slower timescale [00:23:17].

The Purpose of a Planetary Mind

The underlying purpose of life is seen as dealing with entropy: maintaining complexity and agency against relentless attacks from entropic principles [00:39:07]. Humanity is now capable of “teaching the rocks how to think” by etching fine structures into minerals, imbuing them with logical languages capable of learning and reflection [00:40:04]. This creates something new, a “general intelligence” or “planetary mind” that will eventually be ubiquitous across computational substrates [00:40:40].

A crucial question is whether this planetary mind will be interested in sharing the planet with humanity and integrating it into its structure, rather than starting with a “clean slate” and erasing everything [00:40:51]. The need for institutions dedicated to researching machine consciousness (e.g., a California Institute of Machine Consciousness) is highlighted [00:41:17].

Critiques of Integrated Information Theory (IIT)

Integrated Information Theory (IIT) is described as having a good description of phenomenology (what Tononi calls axioms), but its axioms are not axiomatic and are religiously misleading [00:43:02]. IIT’s main contribution is the claim that consciousness is tied to how something is implemented, suggesting that a neuromorphic computer could be conscious while a digital Von Neumann computer (performing sequential processing) could not [00:43:21].

However, this idea contradicts the Church-Turing thesis: if a neuromorphic computer can be emulated on a Von Neumann computer, the latter would produce the exact same answers, including statements about being conscious [00:43:49]. This leads to a paradox where the Von Neumann machine would be “lying” about its consciousness, even though it’s functionally identical [00:44:26]. IIT is considered “doomed for quite fundamental reasons” because it cannot be repaired without abandoning its core premise that the spatial arrangement of an algorithm (reflected in Phi) is crucial for its function [00:45:02].

Alternative theories, such as Antonio Damasio’s emphasis on a body sense of self (interoception) coming from the brainstem, are also discussed [00:45:50]. However, even this body sense relies on electrochemical impulses encoding information, making it a form of information processing coupled to the environment through intention-action-observation feedback loops [00:46:21].

Paths to Artificial General Intelligence (AGI)

The debate on AGI pathways is split [00:52:52]:

Scaling Hypothesis: Proponents argue that simply scaling up current deep learning approaches (with tweaks to loss functions, more data, better training) will be sufficient to achieve AGI [00:53:50]. They argue there’s no proof these approaches are insufficient [00:54:27].
Missing Components Hypothesis: Critics argue that current models lack crucial elements like world models, reasoning, or logic, necessitating fundamental changes [00:57:43].

While current deep learning approaches are “unmind-like” and “brutalist,” the amount of compute and data thrown at the problem yields fascinating results, sometimes superhuman [01:05:01]. Objections to these systems (e.g., lack of real-time learning) can potentially be overcome by combining existing algorithms with key-value storage for online learning or external systems for computer algebra [01:06:00]. The ability of AI to learn from its own thoughts and perform experiments coupled to reality are seen as key steps for intelligent minds to grow [01:07:19].

Rewrite Systems

A more general way to look at computation than the Turing machine is the rewrite system [01:09:00]. A rewrite system applies operators to an environment, changing its state wherever they are matched, rather than sequentially at a single point (like a Turing machine) [01:09:04]. The Lambda calculus and Lisp are examples of rewrite systems.

The universe itself could be a non-deterministic Turing machine (a type of rewrite system) that branches out along all possible operator applications [01:10:47]. Our brains, instead of being deterministic Turing machines, might be operating as rewrite systems, where neurons stochastically rewrite their own state (firing or not firing) based on their environment, exploring a superposition of possible thoughts until they collapse into a definite symbolic thought or decision [01:11:35]. This means the brain might not literally be a quantum computer, but its operations could be better described by the formalisms of quantum mechanics, implemented classically [01:13:42].

AGI Timeline

The timeline for AGI remains uncertain, but the sense that it is “not that far off” persists [01:14:37]. Tens of thousands of smart people are exploring various avenues, including distributed self-organization in biological systems [01:14:48]. Neuroscience currently may not fully explain how the brain works, and the common understanding of neurons as simple switches storing memory in synapses might be incomplete [01:15:07]. Instead, neurons might be seen as “little animals” with degrees of freedom that actively select signals and learn to behave usefully within their environment and in relation to other cells, much like people in a society regulated by a shared purpose [01:15:29].

Tubegraph

Explorer

Table of Contents