From: mk_thisisit
Deep learning, a subfield of machine learning, has seen significant advancements and experienced periods of both enthusiastic growth and dormancy. It is fundamentally about teaching systems to capture the structure of input data and learn abstract representations, which has proven crucial for progress in AI [00:24:45].
Historical Waves of Development
The progress of deep learning has been discontinuous, with two significant waves of interest and a period of decline [00:34:55].
First Wave (Late 1980s - Mid 1990s)
The first wave of excitement occurred in the late 1980s when researchers began achieving good results using multi-layer neural networks for tasks like image recognition, particularly for simple images such as handwritten characters [00:03:19]. This period saw a wave of enthusiasm, as it suggested a complete change in the approach to pattern recognition and potential leads to computer vision and general intelligence [00:03:42].
However, this initial interest died down around the mid-1990s because the techniques required large amounts of data and expensive computers, which were not widely available before the internet’s prevalence. Applications were limited to specific areas like handwriting or speech recognition [00:04:08].
Resurgence and Explosion of Interest (2000s - 2013 onwards)
Interest in deep learning gradually grew in the 2000s, leading to a significant “explosion” around 2013 [00:04:42]. This year was pivotal as the research world realized that deep learning “works really well” and could be applied across many different fields [00:04:53]. Since then, its development has accelerated at a dizzying pace, with another turning point occurring in 2015 [00:05:02].
A 2015 article on deep learning, co-authored with Nobel Prize winner Jeff Hinton, was cited in nearly 100,000 scientific publications, becoming one of the most frequently cited [00:01:13]. This article was more of a “manifesto” or review aimed at popularizing a new set of effective techniques and offering tips for future development [00:02:22].
Understanding Machine Learning Paradigms
There are three basic paradigms in machine learning:
- Supervised Learning: The most classic approach, where a system is trained by being given correct answers (e.g., showing an image and labeling it “table”). The system adjusts its parameters to match expected results [00:08:51].
- Reinforcement Learning: The system receives feedback on whether a result was good or bad, rather than explicit correct answers. This mimics human learning (e.g., learning to ride a bike by trial and error). While effective for games like chess or Go, it is “extremely ineffective” in the real world due to the need for millions of trials [00:09:51]. For example, training a self-driving car purely with reinforcement learning would involve thousands of crashes [00:10:35].
- Self-supervised Learning: This method has driven recent advances in natural language understanding and chatbots. The system learns the structure of input data (e.g., text) by being trained to predict missing words or the next word in a sequence. This is the principle behind large language models (LLMs) [00:10:55].
Limitations of Current AI and Deep Learning
Despite impressive progress, current AI systems, particularly those based on deep learning, have significant limitations [00:00:00]:
- Lack of Physical World Understanding: They do not understand the physical world, unlike humans and animals [00:00:13].
- No Permanent Memory: They lack permanent memory [00:00:15].
- Inability to Reason or Plan: Current systems cannot truly reason or plan, which are key features of intelligent behavior [00:05:52]. Their “reasoning” is often a primitive search in token space, generating many sequences and selecting the best, which is expensive and not how humans think [00:26:51].
- Difficulty with Continuous Data: While language is discrete (finite number of words), the physical world is continuous and much harder to understand and predict [00:13:53]. Predicting exact future frames in video recordings is impossible due to too many unpredictable details [00:47:49].
- Moravec’s Paradox: This paradox highlights that what is easy for humans (e.g., physical tasks like manipulating objects, walking) is hard for computers, while what is hard for humans (e.g., playing chess, solving mathematical puzzles) is easy for computers [00:14:50].
- Information Volume: The amount of visual information a child absorbs in their first four years of life is comparable to the training data of the largest language models [00:18:11]. This suggests that training systems solely on text will never achieve human-level AI [00:18:45].
Current AI “Stupidity”
"Currently, [[development_and_challenges_of_artificial_intelligence | AI]] systems are very stupid in many ways, we let ourselves be fooled. Considering them intelligent because they can manipulate language very well, but they cannot, they do not understand the physical world, they do not have permanent memory like we have." <a class="yt-timestamp" data-t="00:00:00">[00:00:00]</a>
Future Directions and Innovations
Designing New AI Systems
Current research aims to design a new type of AI system, still based on deep learning, that can:
- Function in the physical world [00:06:14].
- Have permanent memory [00:06:16].
- Be able to reason and plan [00:06:18].
Such systems might experience emotions like excitement or joy, linked to predicting successful goal achievement, but not anger or jealousy, as these would not be permanently built into them [00:06:25].
Consciousness
The concept of consciousness remains undefined and lacks measurable indicators, making it difficult to ascertain in machines [00:07:27]. Some experts believe the question itself might be ill-posed [00:23:00].
Joint Embedding Predictive Architecture (JEPA)
JEPA is a macro-architecture where different modules, potentially including Transformers, are arranged. It is an alternative to current large language models that solely rely on autoregressive, decoder-based Transformer architectures [00:45:27].
The main idea behind JEPA is to train systems to learn an abstract representation of input data and then make predictions within that representation space, rather than trying to predict every detail in the original input space [00:48:40]. This approach addresses the problem of unpredictability in high-dimensional continuous data like video, where discrete text-based prediction methods fail [00:47:47].
Convolutional Neural Networks (CNNs)
A significant invention in deep learning is the Convolutional Neural Network (CNN), developed in 1988 [00:50:07]. Inspired by the visual cortex, CNNs are designed to process natural signals like images, video, sound, and speech [00:50:13]. They are widely used today in applications such as:
- Driver assistance systems (automatic braking) [00:50:30]
- Speech recognition [00:51:20]
- Image recognition (e.g., identifying plant species from a photo) [00:51:29]
- Handwriting and character recognition (e.g., reading postal codes, checks) [00:50:59]
Open Research and Collaboration
Open research and open-source software are crucial for accelerating progress in AI [00:36:34]. When research is published and code is open, the entire world benefits, leading to faster development and broader contributions from a global community [00:37:01]. This collaborative approach fosters innovation and ensures that no single institution holds a monopoly on good ideas [00:38:10].
An example is PyTorch, an open-source software used by almost the entire AI industry for research and development [00:39:37].
Challenges of AI Integration and Robotics
The “coming decade will be the decade of robotics” [00:42:42], driven by advances in AI. While current industrial robots excel at repetitive, simple tasks in controlled environments [00:31:36], more adaptive robots require AI systems that can understand the physical world, possess permanent memory, and reason and plan [00:33:29]. Many robotics companies are betting on rapid AI advances in the next 3-5 years to make humanoid robots smart enough to handle the complexities of the real world [00:34:10].
Predictions about achieving Level 5 autonomous driving (fully autonomous) within a few years have been consistently wrong [00:32:18]. This illustrates the ongoing challenge of enabling AI to learn as effectively as humans and animals in complex, real-world conditions [00:16:54].
Societal Impact and Economic Implications
The future of AI involves billions of people using AI assistance daily through smart devices like glasses and smartphones [00:42:51]. This necessitates a “huge computing infrastructure” because running LLMs and other AI systems is not cheap and requires significant computing power [00:43:15]. Most large investments in AI infrastructure are for “inference costs” (running systems for users) rather than training [00:44:06].
Europe is seen as having a crucial role in the global AI race, particularly in implementing regulations [00:51:55]. It also possesses significant advantages due to its talent pool of programmers, mathematicians, physicists, computer scientists, and engineers, many of whom are leading scientists in the AI field globally [00:53:05].
Applications of Deep Learning in Healthcare
Deep learning methods show extreme promise in medical applications, particularly in diagnosis using imaging [00:57:17]. For example, they are already implemented in breast cancer diagnosis through mammography [00:57:22]. The ambition extends beyond diagnosis to integrate measurements directly into treatment protocols [00:58:21].