Spaces:
Sleeping
A newer version of the Gradio SDK is available:
5.30.0
๐ต Music Playing
๐ Welcome! Today, weโre learning about Deep Neural Networksโa cool way computers learn! ๐ง ๐ก
๐ค What is a Neural Network?
Imagine a brain made of tiny switches called neurons. These neurons work together to make smart decisions!
๐ข Input Layer
This is where we give the network information, like pictures or numbers.
๐ต Hidden Layers
These layers are like magic helpers that figure out patterns!
- More neurons = better learning ๐ค
- Too many neurons = can be confusing (overfitting) ๐ต
๐ด Output Layer
This is where the network gives us answers! ๐
๐ Building a Deep Neural Network in PyTorch
We can build a deep neural network using PyTorch, a tool that helps computers learn. ๐ฅ๏ธ
๐ Layers of Our Network
1๏ธโฃ First Hidden Layer: Has H1
neurons.
2๏ธโฃ Second Hidden Layer: Has H2
neurons.
3๏ธโฃ Output Layer: Decides the final answer! ๐ฏ
๐ How Does It Work?
1๏ธโฃ Start with an input (x).
2๏ธโฃ Pass through each layer:
- Apply math functions (like
sigmoid
,tanh
, orReLU
). - These help the network understand better! ๐งฉ
3๏ธโฃ Get the final answer! โ
๐จ Different Activation Functions
Activation functions help the network think better! ๐ง
- Sigmoid โ Good for small problems ๐ค
- Tanh โ Works better for deeper networks ๐
- ReLU โ Super strong for big tasks! ๐
๐ข Example: Recognizing Handwritten Numbers
We train the network with MNIST, a dataset of handwritten numbers. ๐๐ข
- Input: 784 pixels (28x28 images) ๐ธ
- Hidden Layers: 50 neurons each ๐ค
- Output: 10 neurons (digits 0-9) ๐
๐ Training the Network
We use Stochastic Gradient Descent (SGD) to teach the network! ๐
- Loss Function: Helps the network learn from mistakes. โโกโ
- Validation Accuracy: Checks how well the network is doing! ๐ฏ
๐ What We Learned
โ
Deep Neural Networks have many hidden layers.
โ
Different activation functions help improve performance.
โ
The more layers we add, the smarter the network becomes! ๐ก
๐ Great job! Now, let's build and train our own deep neural networks! ๐๏ธ๐คโจ
๐ต Music Playing
๐ Welcome! Today, weโll learn how to build a deep neural network in PyTorch using nn.ModuleList
. ๐ง ๐ก
๐ค Why Use nn.ModuleList
?
Instead of adding layers one by one (which takes a long time โณ), we can automate the process! ๐
๐ Building the Neural Network
We create a list called layers
๐:
- First item: Input size (e.g.,
2
features). - Second item: Neurons in the first hidden layer (e.g.,
3
). - Third item: Neurons in the second hidden layer (e.g.,
4
). - Fourth item: Output size (number of classes, e.g.,
3
).
๐ Constructing the Network
๐น Step 1: Create Layers
- We loop through the list, taking two elements at a time:
- First element: Input size ๐ฏ
- Second element: Output size (number of neurons) ๐งฉ
๐น Step 2: Connecting Layers
- First hidden layer โ Input size =
2
, Neurons =3
- Second hidden layer โ Input size =
3
, Neurons =4
- Output layer โ Input size =
4
, Output size =3
โก Forward Function
We pass data through the network:
1๏ธโฃ Apply linear transformation to each layer โ Makes calculations ๐งฎ
2๏ธโฃ Apply activation function (ReLU
) โ Helps the network learn ๐
3๏ธโฃ For the last layer, we only apply linear transformation (since it's a classification task ๐ฏ).
๐ฏ Training the Network
The training process is similar to before! We:
- Use a dataset ๐
- Try different combinations of neurons and layers ๐ค
- See which setup gives the best performance! ๐
๐ Awesome! Now, letโs explore ways to make these networks even better! ๐
๐ต Music Playing
๐ Welcome! Today, weโre learning about weight initialization in Neural Networks! ๐ง โก
๐ค Why Does Weight Initialization Matter?
If we donโt choose good starting weights, our neural network wonโt learn properly! ๐จ
Sometimes, all neurons in a layer get the same weights, which causes problems.
๐ How PyTorch Handles Weights
PyTorch automatically picks starting weights, but we can also set them ourselves! ๐ง
Letโs see what happens when we:
- Set all weights to 1 and bias to 0 โ โ Bad idea!
- Randomly choose weights from a uniform distribution โ โ Better!
๐ The Problem with Random Weights
We use a uniform distribution (random values between -1 and 1). But:
- Too small? โ Weights donโt change much ๐ค
- Too large? โ Vanishing gradient problem ๐ต
๐ Whatโs a Vanishing Gradient?
If weights are too big, activations get too large, and the gradient shrinks to zero.
That means the network stops learning! ๐ซ
๐ Fixing the Problem
๐ฏ Solution: Scale Weights Based on Neurons
We scale the weight range based on how many neurons we have:
- 2 neurons? โ Scale by 1/2
- 4 neurons? โ Scale by 1/4
- 100 neurons? โ Scale by 1/100
This prevents the vanishing gradient issue! โ
๐ฌ Different Weight Initialization Methods
๐ 1. Default PyTorch Method
- PyTorch automatically picks a range:
- Lower bound:
-1 / sqrt(L_in)
- Upper bound:
+1 / sqrt(L_in)
- Lower bound:
๐ต 2. Xavier Initialization
- Best for tanh activation
- Uses the number of input and output neurons
- We apply
xavier_uniform_()
to set the weights
๐ด 3. He Initialization
- Best for ReLU activation
- Uses the He initialization method
- We apply
he_uniform_()
to set the weights
๐ Which One is Best?
We compare:
โ
PyTorch Default
โ
Xavier Method (tanh)
โ
He Method (ReLU)
The Xavier and He methods help the network learn faster! ๐
๐ Great job! Now, letโs try different weight initializations and see what works best! ๐๏ธ๐ฌ
๐ต Music Playing
๐ Welcome! Today, weโre learning about Gradient Descent with Momentum! ๐๐
๐ค Whatโs the Problem?
Sometimes, when training a neural network, the model can get stuck:
- Saddle Points โ Flat areas where learning stops ๐๏ธ
- Local Minima โ Not the best solution, but we get trapped ๐
๐โโ๏ธ What is Momentum?
Momentum helps the model keep moving even when it gets stuck! ๐จ
Itโs like rolling a ball downhill:
- Gradient (Force) โ Tells us where to go ๐
- Momentum (Mass) โ Helps us keep moving even on flat surfaces โก
๐ How Does It Work?
๐น Step 1: Compute Velocity
- Velocity (
v
) = Old velocity (v_k
) + Learning step (gradient * learning rate
) - The momentum term (๐) controls how much we keep from the past.
๐น Step 2: Update Weights
- New weight (
w_k+1
) = Old weight (w_k
) - Learning rate * Velocity
The bigger the momentum, the harder it is to stop moving! ๐โโ๏ธ๐จ
โ ๏ธ Why Does It Help?
๐๏ธ Saddle Points
- Without Momentum โ Model stops moving in flat areas โ
- With Momentum โ Keeps moving past the flat spots โ
โฌ Local Minima
- Without Momentum โ Gets stuck in a bad spot ๐
- With Momentum โ Pushes through and finds a better solution! ๐ฏ
๐ Picking the Right Momentum
- Too Small? โ Model gets stuck ๐
- Too Large? โ Model overshoots the best answer ๐
- Best Choice? โ We test different values and pick what works! ๐ฌ
๐ Using Momentum in PyTorch
Just add the momentum value to the optimizer!
optimizer = torch.optim.SGD(model.parameters(), lr=0.01, momentum=0.5)
In the lab, we test different momentum values on a dataset and see how they affect learning! ๐
๐ Great job! Now, letโs experiment with momentum and see how it helps our model! ๐๏ธโก
๐ต Music Playing
๐ Welcome! Today, weโre learning about Batch Normalization! ๐๐
๐ค Whatโs the Problem?
When training a neural network, the activations (outputs) can vary a lot, making learning slower and unstable. ๐
Batch Normalization fixes this by:
โ
Making activations more consistent
โ
Helping the network learn faster
โ
Reducing problems like vanishing gradients
๐ How Does Batch Normalization Work?
๐ Step 1: Normalize Each Mini-Batch
For each neuron in a layer:
1๏ธโฃ Compute the mean and standard deviation of its activations. ๐
2๏ธโฃ Normalize the outputs using:
[
z' = \frac{z - \text{mean}}{\text{std dev} + \epsilon}
]
(We add a small value ฮต
to avoid division by zero.)
๐ Step 2: Scale and Shift
- Instead of leaving activations at 0 and 1, we scale and shift them:
[ z'' = \gamma \cdot z' + \beta ] - ฮณ (scale) and ฮฒ (shift) are learned during training! ๐๏ธโโ๏ธ
๐ฌ Example: Normalizing Activations
- First Mini-Batch (X1) โ Compute mean & std for each neuron, normalize, then scale & shift
- Second Mini-Batch (X2) โ Repeat for new batch! โป
- Next Layer โ Apply batch normalization again! ๐
๐ Prediction Time
- During training, we compute the mean & std for each batch.
- During testing, we use the population mean & std instead. ๐
๐ Using Batch Normalization in PyTorch
import torch.nn as nn
class NeuralNetwork(nn.Module):
def __init__(self):
super(NeuralNetwork, self).__init__()
self.fc1 = nn.Linear(10, 3) # First layer (10 inputs, 3 neurons)
self.bn1 = nn.BatchNorm1d(3) # Batch Norm for first layer
self.fc2 = nn.Linear(3, 4) # Second layer (3 inputs, 4 neurons)
self.bn2 = nn.BatchNorm1d(4) # Batch Norm for second layer
def forward(self, x):
x = self.bn1(self.fc1(x)) # Apply Batch Norm
x = self.bn2(self.fc2(x)) # Apply Batch Norm again
return x
- Training? Set the model to train mode ๐๏ธโโ๏ธ
model.train()
- Predicting? Use evaluation mode ๐
model.eval()
๐ Why Does Batch Normalization Work?
โ Helps Gradient Descent Work Better
- Normalized data = smoother loss function ๐ฏ
- Gradients point in the right direction = Faster learning! ๐
โ Reduces Vanishing Gradient Problem
- Sigmoid & Tanh activations suffer from small gradients ๐ข
- Normalization keeps activations in a good range ๐
โ Allows Higher Learning Rates
- Networks can train faster without getting unstable โฉ
โ Reduces Need for Dropout
- Some studies show Batch Norm can replace Dropout ๐คฏ
๐ Great job! Now, letโs try batch normalization in our own models! ๐๏ธ๐