From: jimruttshow8596

Melanie Mitchell, a professor at the Santa Fe Institute, whose research focuses on conceptual abstraction, analogy making, and visual recognition in artificial intelligence systems, discusses the nuances of AI understanding compared to human cognition [00:00:34]. Her latest book, “Artificial Intelligence: A Guide for Thinking Humans,” delves into these topics [01:11:00].

Limitations of AI on Standardized Tests

Mitchell highlights the limitations of current AI models, such as GPT-3.5 and GPT-4, when assessed with standardized tests [02:57:00]. While GPT-4 has shown significantly better performance on various standardized exams than its predecessors [02:24:00], Melanie Mitchell’s initial concerns regarding GPT-3.5’s performance remain relevant [02:55:00].

Key issues with using these tests for AI include:

  • Training Data Influence: It’s difficult to ascertain if the AI is genuinely understanding concepts or merely “memorizing or compressing” similar questions encountered in its vast training data [04:00:00] [04:32:00]. Humans are not expected to have memorized all of Wikipedia or GitHub code, unlike AI [03:31:00] [03:39:00].
  • Prompt Sensitivity: AI models are highly sensitive to the way prompts are phrased [05:32:00]. Mitchell conducted an experiment where a slightly reworded version of a question that GPT-3.5 initially answered with an A+ resulted in a poor response [05:15:00]. This raises questions about the AI’s “understanding of the underlying concepts” [05:47:00].
  • Lack of Transparency: OpenAI’s lack of transparency regarding the exact material used for GPT-4’s tests and restricted access to the model hinders independent scientific probing and verification [06:21:00] [06:30:00].
  • Extrapolation Challenges: Assumptions about human cognition, such as the ability to extrapolate test performance to real-world understanding, may not apply to large language models [03:13:00] [03:50:00].

Despite these concerns, GPT-4 demonstrated a significant improvement over GPT-3.5, with 9 out of 10 answers being correct when asked about prominent guests on the Jim Rutt Show, whereas GPT-3.5 only got 2 out of 10 correct [13:17:00].

Measuring Intelligence

When given a vocabulary IQ test (VIQT), GPT-3.5 achieved an IQ of 119 [02:17:00]. While this seems impressive, it’s a task ideally suited for a language model trained on vast amounts of text [02:30:00]. The correlation between vocabulary knowledge and general intelligence, which is assumed in human IQ tests, may not hold true for AI systems [02:51:00].

The Debate on AI Understanding and Consciousness

Melanie Mitchell, along with David Krakauer, co-authored a paper titled “The Debate Over Understanding and AI’s Large Language Models” [02:07:00]. This paper summarizes the two main sides of the AI understanding debate:

  • AI Understands: Proponents argue that large language models understand human language and potentially the world in a similar way to humans, with some even suggesting they may be conscious [02:44:00] [02:51:00].
  • Stochastic Parrots: Opponents contend that these systems merely parrot language and compute the probability of the next word without true understanding [03:17:17].

The core issue is that the term “understanding” itself is not well-understood in the context of AI [03:18:00]. This ambiguity is forcing researchers to “refine and clarify our understanding of what these mental terms mean” [03:49:00].

Distinctions Between Human and AI Cognition

Compression vs. Vast Memory

Humans possess an innate desire to “compress” information, forming “lower dimensional representations” or models of the world [03:09:00] [03:14:00]. This is partly due to the constraint of a small working memory [03:53:00]. AI models, with context windows of up to 32,000 tokens (GPT-4), do not face the same evolutionary pressure to build these compressed models [03:59:00] [03:49:00]. This difference may lead to human understanding being “more generalizable” [03:29:00].

Emotions, Long-Term Memory, and Caring

AI models lack a human-like “long-term memory” or episodic memory [03:21:00]. While they store information in billions of weights, they do not possess a memory of past interactions or experiences that form a sense of self [03:33:00].

Humans are social species, and emotions play a crucial role in social interactions and decision-making [04:35:00]. Antonio Damasio’s work highlights that individuals with damaged emotional machinery struggle with even simple decisions, suggesting that emotions or intuition serve as a “hack to get around the combinatoric explosion of inference” [03:57:00] [04:00:00] [04:22:00].

Philosopher Margaret Boden suggested that “AI won’t take over the world because it doesn’t care” [04:07:00]. While AlphaGo can win games of Go better than any human, it does not “care” in an emotional sense, which is a key differentiator from human motivation [04:24:00] [04:28:00].

Hallucinations and Truth

AI models, unlike humans who know when they are lying, do not have a model of what is true or untrue [04:50:00]. Their “hallucinations” are linguistically indistinguishable from truths because they rely purely on statistics, making it difficult to detect falsehoods without external verification [04:41:00] [04:56:00].

The Future of AI and Human Understanding

The development of LLMs is forcing a re-evaluation of what intelligence, mind and artificial intelligence | cognition, and understanding truly mean [03:38:00]. There is a need for “a new, better science of intelligence” to make sense of these systems [04:49:00].

Key areas for future research and development include:

  • Improved Assessments: Developing better assessments that can truly predict AI abilities in real-world tasks, beyond traditional standardized tests [04:55:00]. Efforts like the Stanford “Holistic Evaluation of Language Models” (HELM) aim to address this [05:08:00].
  • Open Source Models: Projects like the joint venture between Stability AI and EleutherAI promise to provide open-source models, data sets, and software, enabling more rigorous and transparent scientific research into AI capabilities [07:33:00] [08:13:00].
  • Integrating Memory and Intentional Mechanisms: Exploring ways to build hierarchies of memory and intentional mechanisms external to LLMs to make them act more like the unconscious processes in human language production [03:52:00] [03:57:00].
  • Multimodality: The integration of various data types (text, images, video) in AI models could allow them to develop more intuitive physics models, similar to how humans learn from physical experience [04:38:00] [04:59:00].
  • Understanding Emergence: Investigating “phase changes” or emergent properties that occur in LLMs at certain scales, where new capabilities appear that were not present in smaller models [05:17:00].

Despite the rapid advancements, there’s significant disagreement among experts regarding how these systems work and their true capabilities [05:35:00]. This highlights how little we still understand about the AI we have created [05:48:00].