From: jimruttshow8596
The discussion surrounding artificial intelligence (AI), particularly large language models (LLMs) like GPT, has brought to the forefront intense debates about whether these systems truly possess understanding and even consciousness or sentience [00:10:06]. Melanie Mitchell, a professor at the Santa Fe Institute and author of “Artificial Intelligence: A Guide for Thinking Humans,” highlights that the word “understanding” itself is not well-understood in this context, leading to significant stress on the term [00:31:15].
Assessing AI Performance on Standardized Tests
Melanie Mitchell’s essay, “Did ChatGPT Really Pass Graduate Level Exams?”, initially questioned the performance of ChatGPT 3.5 [00:01:36]. While GPT-4 has shown significantly better results on various standardized exams, except for AP English [00:02:36], Mitchell’s core concerns remain relevant [00:02:58].
Limitations of Current Testing Methods
When humans take standardized exams, assumptions are made about their cognition, such as not having memorized entire datasets like Wikipedia [00:16:22]. Key issues with testing LLMs include:
- Training Data Exposure: Whether the questions or similar ones appeared in the model’s vast training data [00:04:00].
- Sensitivity to Prompts: LLMs are highly sensitive to prompt wording. An identical problem with a slightly different scenario can lead to poor performance, raising questions about underlying conceptual understanding [00:05:16]. Jim Rutt’s experiment where GPT-3.5 generated vastly different results based on paraphrased prompts exemplifies this [00:11:01].
- Hallucinations: LLMs often generate incorrect information, or “hallucinate,” with the same confidence as correct answers [00:10:01]. Unlike humans who know they are lying, LLMs’ hallucinations are not linguistically distinct from their truths, making them difficult to detect [00:36:23].
- Lack of Transparency: Companies like OpenAI provide limited access to models like GPT-4 and lack transparency regarding tested materials, making scientific validation difficult [00:06:05]. This has led to calls for OpenAI to be renamed “Closed AI” [00:06:44].
The Problem of Extrapolation
Melanie Mitchell questions whether performance on these tests extrapolates to real-world capabilities for LLMs in the same way it does for humans [00:17:10]. While GPT-3.5 scored an IQ of 119 on a vocabulary test, an “ideal task for a language model,” it still didn’t ace it [00:22:19].
The Nature of Understanding
The paper “The Debate Over Understanding and AI’s Large Language Models,” co-written by Melanie Mitchell and David Krakauer, delves into the definition of understanding in the context of AI [00:29:15].
Different Views on Understanding
There are two main perspectives:
- “Pro-Understanding” Side: Proponents argue that LLMs can understand human language similarly to humans and might even be conscious [00:29:40]. They suggest that training on language, which represents the world, enables world understanding [00:30:04].
- “Stochastic Parrot” Side: Opponents argue that LLMs merely “parrot” trained language by computing the probability of the next word, without genuine understanding [00:30:15].
Human vs. AI Cognition
Human cognition, as explored by cognitive science and neuroscience, approaches understanding differently from LLMs [00:30:59]. Key distinctions include:
- Compression: Humans have an innate desire to compress complex information into lower-dimensional representations or concepts, like Newton’s laws, due to limited working memory [00:35:05]. LLMs, with their massive context windows (e.g., GPT-4’s 32k tokens), lack this evolutionary pressure for compression [00:36:03].
- Long-Term Memory: Humans possess long-term memory, including episodic memory, which contributes to a sense of self and informs interactions [00:38:21]. LLMs, while storing information in their billions of parameters, lack this kind of experiential memory [00:38:55].
- Emotions: Emotions play a crucial role in human decision-making and social interaction [00:39:26]. As philosopher Margaret Bowden suggested, AI “won’t take over the world because it doesn’t care” [00:41:04]. AlphaGo, while superior to humans in Go, did not “care” about winning in an emotional sense [00:41:28].
- Physical Grounding: LLMs lack grounding in bodily sensations and physical experience, which are fundamental to human understanding of language and the world [00:45:34]. Whether language alone is rich enough to convey intuitive physics or psychology models remains an empirical question [00:46:31].
Redefining Intelligence and Understanding
The emergence of LLMs is forcing a re-evaluation and clarification of terms like intelligence, consciousness, and understanding [00:31:45]. The debate suggests a need for a “pluralistic” view of understanding, acknowledging different kinds of intelligences that may have different forms of understanding [00:43:35]. AlphaFold, for instance, predicts protein structure better than humans but without a mechanistic physics model [00:42:52].
Current Challenges and Future Directions
The field of AI is moving at an “unbelievably fast” pace, with rapid advancements making earlier observations quickly outdated [00:01:52].
Scientific Study and Transparency
A major challenge is the lack of a clear view of what these systems can and cannot do [00:09:10]. Commercial entities like OpenAI are not “open,” hindering scientific research by not providing access to their models or detailed testing materials [00:06:05].
Fortunately, projects like EleutherAI and Stability AI are working on open-source models, including software and data sets, specifically designed for stable scientific research [00:07:35]. Other companies like Hugging Face and Meta have also open-sourced some of their language model software [00:08:33].
New Assessment Methodologies
There’s a critical need to develop new assessments that can accurately predict LLMs’ abilities in real-world tasks [00:25:00]. The “Holistic Evaluation of Language Models” (HELM) project from Stanford is one such initiative [00:25:06]. This presents an opportunity for new businesses, akin to a “College Board for LLMs” [00:25:58].
Bridging the Gap to Human-like Cognition
Future research might involve:
- External Memory Hierarchies: Jim Rutt plans to experiment with building external memory hierarchies and intentional mechanisms to coerce LLMs into acting more like unconscious human language processes [00:37:45].
- Multimodality: Integrating other sensory modalities (e.g., vision) with language could potentially help LLMs develop basic intuitive physics models [00:47:38].
- Online Learning: Currently, LLMs are primarily static feed-forward networks [00:49:42]. The addition of true online learning, allowing them to adapt to the world as they encounter it, would be a significant step [00:49:52].
Emergence and Complexity
The concept of “more is different” applies to LLMs; larger models exhibit “phase changes” where interesting emergent properties appear that were not present at smaller scales [00:51:17]. For EleutherAI’s models, this threshold appears to be around 10 gigabytes [00:51:31]. This represents a significant challenge and opportunity for complex systems science [00:52:22].
The debate over AI’s understanding and consciousness is ongoing, dynamic, and rapidly evolving. It’s pushing humanity to clarify its own definitions of intelligence and understanding, making it a fascinating time for both AI researchers and the public [00:43:58].