Role of Attention in Learning and Cognition

Mind, Consciousness, and Attention

Attention plays a crucial role in the emergence of mind and consciousness. The core feature of consciousness is the ability to remember what was paid attention to [00:18:33]. If an individual does not remember paying attention to something, they are not conscious of it in hindsight [00:18:36]. This implies that conscious information needs to be integrated into a common protocol for later access, suggesting a localized integration of previously distributed information in the brain [00:18:46].

The concept of attention is gaining prominence in machine learning paradigms [00:19:05]. While not identical, the “attention” mechanisms in new neural network architectures are analogous to human mental attentional learning [00:19:21].

Attention and Learning Efficiency

Human and animal learning differs significantly from current artificial neural networks, particularly in efficiency. Humans can learn remarkably fast if they can pay attention [00:22:15]. The challenge lies in learning how to pay attention in the right way [00:22:18]. An example is a tennis coach who can teach someone to play decently within half an hour by precisely directing the subject’s attention to the critical aspects of their behavior, leading to quick updates in performance [00:22:24]. Without knowing what to pay attention to, a system must “brute force” the problem, which is a very wasteful approach [00:22:53].

From a data compression perspective, human minds identify structure over entire sentences, unlike simpler statistical models that might lose predictive power due to intervening words [00:24:42]. Human learning, akin to curriculum learning, starts with short, simple sentences for children to maintain focus [00:25:01].

The human attentional algorithm is described as a “hack” to overcome the combinatorial explosion of options in high-dimensional problems that organisms face [00:34:28].

Attention in GPT-3 and Future Systems

GPT-3, based on the transformer algorithm, made significant strides in addressing challenges in language processing by making statistics over all parts in a working memory window and finding relationships [00:26:10]. However, it still has limitations:

Fixed Working Memory Window: GPT-3 has a fixed working memory window of 2048 adjacent tokens, meaning it cannot comprehend large images, video, or relate early parts of a book to later parts; it can only keep about two pages in memory at a time [00:26:21]. Humans, by contrast, can construct their working memory contents with far more degrees of freedom, even if their working memory might be smaller in raw capacity [01:19:01]. This limitation leads to “massive retrograde amnesia” for the model [01:16:39].
Online Learning: GPT-3 is an offline learning system, meaning it stops learning after its training phase (e.g., October 2019 data) [01:19:14]. A truly human-like agent needs to constantly learn and track reality in real-time [01:19:39].
Relevance Realization: GPT-3 does not inherently care about relevance [01:20:05]. Its “relevance sensation” is derived from the fact that humans only write down things that are relevant to them [01:20:08]. For a system interacting with the world and processing rich sensory data, a motivational system is needed to focus on the most promising parts of the model [01:20:46]. This ability to assign relevance to learning and meta-learning is crucial [01:21:01].

Future advancements in artificial intelligence and cognitive science will likely focus on:

Larger and Dynamic Attentional Windows: Extending attention to actively change and construct working memory contexts, allowing for dynamic contextualization and rewriting of information [01:17:40] [01:17:47].
Multi-modal Representation: Moving from language-centric models to multi-modal representations that are agnostic to the type of information they represent [01:18:00].

Attention and Perception

Perception is more akin to deep learning systems than to linguistic or symbolic systems [01:08:08]. The “realness” of reality, an experience of a conscious agent, is a model property—a label the mind attaches to parameter dimensions [01:09:12]. This label signifies that the experienced patterns are predictive of future sensory input [01:09:40].

To have an enlightened relationship with reality, it’s necessary to realize that perception, including the perception of self and the relationship to the universe, is a representation [01:11:42]. Individuals need to pay attention to this representation and to attention itself, understanding how their attentional system constructs reality [01:12:02]. Most people don’t do this unless their attention processes stop working well, or they face an existential crisis [01:12:23].

Tubegraph

Explorer

Table of Contents

Role of Attention in Learning and Cognition

Mind, Consciousness, and Attention

Attention and Learning Efficiency

Attention in GPT-3 and Future Systems

Attention and Perception

Graph View

Backlinks