From: 3blue1brown
The process of a neural network “learning” refers to how the computer identifies appropriate settings for its numerous internal parameters to solve a given problem [00:12:31]. This process involves adjusting the thousands of weights and biases within the network [00:12:31].
Training a Neural Network
When a neural network is trained to recognize handwritten digits, it is fed an image, and the 784 neurons of the input layer are activated based on the brightness of each pixel [00:05:03]. This pattern of activations then propagates through the network, causing specific patterns in subsequent layers, ultimately leading to a pattern in the output layer [00:05:11]. The brightest neuron in the output layer indicates the network’s prediction for the digit [00:05:22].
The specific method for training neural networks and how one layer influences the next involves intricate mathematical computations, which constitute the “heart of the network as an information processing mechanism” [00:04:39].
The Role of Weights and Biases
A typical neural network designed for digit recognition can have approximately 13,000 total weights and biases [00:12:18]. These serve as “knobs and dials” that can be adjusted to alter the network’s behavior [00:12:23].
The weights associated with connections between neurons determine what specific patterns a neuron will detect [00:12:23]. For example, by setting positive weights in a specific region and negative weights in surrounding pixels, a neuron can be made to pick up on edges [00:09:42]. The bias for a neuron controls how high the weighted sum of inputs needs to be before the neuron becomes significantly active [00:11:23].
Hope for Layered Learning
In an ideal scenario, each neuron in the hidden layers might learn to correspond to specific subcomponents of digits [00:06:07]. For instance:
- Neurons in an early hidden layer could learn to recognize small edges [00:07:15].
- Neurons in a subsequent hidden layer might learn to recognize more complex patterns, such as loops or long lines, by combining these smaller edge detections [00:06:53].
- The final output layer would then identify digits based on the combination of these recognized subcomponents [00:06:32].
This hierarchical approach suggests that a neural network can break down complex recognition tasks into layers of abstraction, similar to how speech parsing involves distinguishing sounds, syllables, words, and phrases [00:08:00]. Whether a trained network actually performs this type of hierarchical decomposition is a question explored after understanding the training process [00:07:40].