EverythingIsAFont / deep_networks.md
taellinglin's picture
Upload 61 files
9dce563 verified

A newer version of the Gradio SDK is available: 5.30.0

Upgrade

๐ŸŽต Music Playing

๐Ÿ‘‹ Welcome! Today, weโ€™re learning about Deep Neural Networksโ€”a cool way computers learn! ๐Ÿง ๐Ÿ’ก

๐Ÿค– What is a Neural Network?

Imagine a brain made of tiny switches called neurons. These neurons work together to make smart decisions!

๐ŸŸข Input Layer

This is where we give the network information, like pictures or numbers.

๐Ÿ”ต Hidden Layers

These layers are like magic helpers that figure out patterns!

  • More neurons = better learning ๐Ÿค“
  • Too many neurons = can be confusing (overfitting) ๐Ÿ˜ต

๐Ÿ”ด Output Layer

This is where the network gives us answers! ๐Ÿ†


๐Ÿ— Building a Deep Neural Network in PyTorch

We can build a deep neural network using PyTorch, a tool that helps computers learn. ๐Ÿ–ฅ๏ธ

๐Ÿ›  Layers of Our Network

1๏ธโƒฃ First Hidden Layer: Has H1 neurons.
2๏ธโƒฃ Second Hidden Layer: Has H2 neurons.
3๏ธโƒฃ Output Layer: Decides the final answer! ๐ŸŽฏ


๐Ÿ”„ How Does It Work?

1๏ธโƒฃ Start with an input (x).
2๏ธโƒฃ Pass through each layer:

  • Apply math functions (like sigmoid, tanh, or ReLU).
  • These help the network understand better! ๐Ÿงฉ
    3๏ธโƒฃ Get the final answer! โœ…

๐ŸŽจ Different Activation Functions

Activation functions help the network think better! ๐Ÿง 

  • Sigmoid โž Good for small problems ๐Ÿค
  • Tanh โž Works better for deeper networks ๐ŸŒŠ
  • ReLU โž Super strong for big tasks! ๐Ÿš€

๐Ÿ”ข Example: Recognizing Handwritten Numbers

We train the network with MNIST, a dataset of handwritten numbers. ๐Ÿ“๐Ÿ”ข

  • Input: 784 pixels (28x28 images) ๐Ÿ“ธ
  • Hidden Layers: 50 neurons each ๐Ÿค–
  • Output: 10 neurons (digits 0-9) ๐Ÿ”Ÿ

๐Ÿš€ Training the Network

We use Stochastic Gradient Descent (SGD) to teach the network! ๐Ÿ“š

  • Loss Function: Helps the network learn from mistakes. โŒโžกโœ…
  • Validation Accuracy: Checks how well the network is doing! ๐ŸŽฏ

๐Ÿ† What We Learned

โœ… Deep Neural Networks have many hidden layers.
โœ… Different activation functions help improve performance.
โœ… The more layers we add, the smarter the network becomes! ๐Ÿ’ก


๐ŸŽ‰ Great job! Now, let's build and train our own deep neural networks! ๐Ÿ—๏ธ๐Ÿค–โœจ

๐ŸŽต Music Playing

๐Ÿ‘‹ Welcome! Today, weโ€™ll learn how to build a deep neural network in PyTorch using nn.ModuleList. ๐Ÿง ๐Ÿ’ก

๐Ÿค– Why Use nn.ModuleList?

Instead of adding layers one by one (which takes a long time โณ), we can automate the process! ๐Ÿš€


๐Ÿ— Building the Neural Network

We create a list called layers ๐Ÿ“‹:

  • First item: Input size (e.g., 2 features).
  • Second item: Neurons in the first hidden layer (e.g., 3).
  • Third item: Neurons in the second hidden layer (e.g., 4).
  • Fourth item: Output size (number of classes, e.g., 3).

๐Ÿ”„ Constructing the Network

๐Ÿ”น Step 1: Create Layers

  • We loop through the list, taking two elements at a time:
    • First element: Input size ๐ŸŽฏ
    • Second element: Output size (number of neurons) ๐Ÿงฉ

๐Ÿ”น Step 2: Connecting Layers

  • First hidden layer โž Input size = 2, Neurons = 3
  • Second hidden layer โž Input size = 3, Neurons = 4
  • Output layer โž Input size = 4, Output size = 3

โšก Forward Function

We pass data through the network:
1๏ธโƒฃ Apply linear transformation to each layer โž Makes calculations ๐Ÿงฎ
2๏ธโƒฃ Apply activation function (ReLU) โž Helps the network learn ๐Ÿ“ˆ
3๏ธโƒฃ For the last layer, we only apply linear transformation (since it's a classification task ๐ŸŽฏ).


๐ŸŽฏ Training the Network

The training process is similar to before! We:

  • Use a dataset ๐Ÿ“Š
  • Try different combinations of neurons and layers ๐Ÿค–
  • See which setup gives the best performance! ๐Ÿ†

๐ŸŽ‰ Awesome! Now, letโ€™s explore ways to make these networks even better! ๐Ÿš€

๐ŸŽต Music Playing

๐Ÿ‘‹ Welcome! Today, weโ€™re learning about weight initialization in Neural Networks! ๐Ÿง โšก

๐Ÿค” Why Does Weight Initialization Matter?

If we donโ€™t choose good starting weights, our neural network wonโ€™t learn properly! ๐Ÿšจ
Sometimes, all neurons in a layer get the same weights, which causes problems.


๐Ÿš€ How PyTorch Handles Weights

PyTorch automatically picks starting weights, but we can also set them ourselves! ๐Ÿ”ง
Letโ€™s see what happens when we:

  • Set all weights to 1 and bias to 0 โž โŒ Bad idea!
  • Randomly choose weights from a uniform distribution โž โœ… Better!

๐Ÿ”„ The Problem with Random Weights

We use a uniform distribution (random values between -1 and 1). But:

  • Too small? โž Weights donโ€™t change much ๐Ÿค
  • Too large? โž Vanishing gradient problem ๐Ÿ˜ต

๐Ÿ“‰ Whatโ€™s a Vanishing Gradient?

If weights are too big, activations get too large, and the gradient shrinks to zero.
That means the network stops learning! ๐Ÿšซ


๐Ÿ›  Fixing the Problem

๐ŸŽฏ Solution: Scale Weights Based on Neurons

We scale the weight range based on how many neurons we have:

  • 2 neurons? โž Scale by 1/2
  • 4 neurons? โž Scale by 1/4
  • 100 neurons? โž Scale by 1/100

This prevents the vanishing gradient issue! โœ…


๐Ÿ”ฌ Different Weight Initialization Methods

๐Ÿ— 1. Default PyTorch Method

  • PyTorch automatically picks a range:
    • Lower bound: -1 / sqrt(L_in)
    • Upper bound: +1 / sqrt(L_in)

๐Ÿ”ต 2. Xavier Initialization

  • Best for tanh activation
  • Uses the number of input and output neurons
  • We apply xavier_uniform_() to set the weights

๐Ÿ”ด 3. He Initialization

  • Best for ReLU activation
  • Uses the He initialization method
  • We apply he_uniform_() to set the weights

๐Ÿ† Which One is Best?

We compare:
โœ… PyTorch Default
โœ… Xavier Method (tanh)
โœ… He Method (ReLU)

The Xavier and He methods help the network learn faster! ๐Ÿš€


๐ŸŽ‰ Great job! Now, letโ€™s try different weight initializations and see what works best! ๐Ÿ—๏ธ๐Ÿ”ฌ

๐ŸŽต Music Playing

๐Ÿ‘‹ Welcome! Today, weโ€™re learning about Gradient Descent with Momentum! ๐Ÿš€๐Ÿ”„

๐Ÿค” Whatโ€™s the Problem?

Sometimes, when training a neural network, the model can get stuck:

  • Saddle Points โž Flat areas where learning stops ๐Ÿ”๏ธ
  • Local Minima โž Not the best solution, but we get trapped ๐Ÿ˜ž

๐Ÿƒโ€โ™‚๏ธ What is Momentum?

Momentum helps the model keep moving even when it gets stuck! ๐Ÿ’จ
Itโ€™s like rolling a ball downhill:

  • Gradient (Force) โž Tells us where to go ๐Ÿ€
  • Momentum (Mass) โž Helps us keep moving even on flat surfaces โšก

๐Ÿ”„ How Does It Work?

๐Ÿ”น Step 1: Compute Velocity

  • Velocity (v) = Old velocity (v_k) + Learning step (gradient * learning rate)
  • The momentum term (๐œŒ) controls how much we keep from the past.

๐Ÿ”น Step 2: Update Weights

  • New weight (w_k+1) = Old weight (w_k) - Learning rate * Velocity

The bigger the momentum, the harder it is to stop moving! ๐Ÿƒโ€โ™‚๏ธ๐Ÿ’จ


โš ๏ธ Why Does It Help?

๐Ÿ”๏ธ Saddle Points

  • Without Momentum โž Model stops moving in flat areas โŒ
  • With Momentum โž Keeps moving past the flat spots โœ…

โฌ‡ Local Minima

  • Without Momentum โž Gets stuck in a bad spot ๐Ÿ˜–
  • With Momentum โž Pushes through and finds a better solution! ๐ŸŽฏ

๐Ÿ† Picking the Right Momentum

  • Too Small? โž Model gets stuck ๐Ÿ˜•
  • Too Large? โž Model overshoots the best answer ๐Ÿš€
  • Best Choice? โž We test different values and pick what works! ๐Ÿ”ฌ

๐Ÿ›  Using Momentum in PyTorch

Just add the momentum value to the optimizer!

optimizer = torch.optim.SGD(model.parameters(), lr=0.01, momentum=0.5)

In the lab, we test different momentum values on a dataset and see how they affect learning! ๐Ÿ“Š


๐ŸŽ‰ Great job! Now, letโ€™s experiment with momentum and see how it helps our model! ๐Ÿ—๏ธโšก

๐ŸŽต Music Playing

๐Ÿ‘‹ Welcome! Today, weโ€™re learning about Batch Normalization! ๐Ÿš€๐Ÿ”„

๐Ÿค” Whatโ€™s the Problem?

When training a neural network, the activations (outputs) can vary a lot, making learning slower and unstable. ๐Ÿ˜–
Batch Normalization fixes this by:
โœ… Making activations more consistent
โœ… Helping the network learn faster
โœ… Reducing problems like vanishing gradients


๐Ÿ”„ How Does Batch Normalization Work?

๐Ÿ— Step 1: Normalize Each Mini-Batch

For each neuron in a layer:
1๏ธโƒฃ Compute the mean and standard deviation of its activations. ๐Ÿ“Š
2๏ธโƒฃ Normalize the outputs using:
[ z' = \frac{z - \text{mean}}{\text{std dev} + \epsilon} ]
(We add a small value ฮต to avoid division by zero.)

๐Ÿ— Step 2: Scale and Shift

  • Instead of leaving activations at 0 and 1, we scale and shift them:
    [ z'' = \gamma \cdot z' + \beta ]
  • ฮณ (scale) and ฮฒ (shift) are learned during training! ๐Ÿ‹๏ธโ€โ™‚๏ธ

๐Ÿ”ฌ Example: Normalizing Activations

  • First Mini-Batch (X1) โž Compute mean & std for each neuron, normalize, then scale & shift
  • Second Mini-Batch (X2) โž Repeat for new batch! โ™ป
  • Next Layer โž Apply batch normalization again! ๐Ÿ”„

๐Ÿ† Prediction Time

  • During training, we compute the mean & std for each batch.
  • During testing, we use the population mean & std instead. ๐Ÿ“Š

๐Ÿ›  Using Batch Normalization in PyTorch

import torch.nn as nn

class NeuralNetwork(nn.Module):
    def __init__(self):
        super(NeuralNetwork, self).__init__()
        self.fc1 = nn.Linear(10, 3)  # First layer (10 inputs, 3 neurons)
        self.bn1 = nn.BatchNorm1d(3) # Batch Norm for first layer
        self.fc2 = nn.Linear(3, 4)   # Second layer (3 inputs, 4 neurons)
        self.bn2 = nn.BatchNorm1d(4) # Batch Norm for second layer

    def forward(self, x):
        x = self.bn1(self.fc1(x))  # Apply Batch Norm
        x = self.bn2(self.fc2(x))  # Apply Batch Norm again
        return x
  • Training? Set the model to train mode ๐Ÿ‹๏ธโ€โ™‚๏ธ
    model.train()
    
  • Predicting? Use evaluation mode ๐Ÿ“ˆ
    model.eval()
    

๐Ÿš€ Why Does Batch Normalization Work?

โœ… Helps Gradient Descent Work Better

  • Normalized data = smoother loss function ๐ŸŽฏ
  • Gradients point in the right direction = Faster learning! ๐Ÿš€

โœ… Reduces Vanishing Gradient Problem

  • Sigmoid & Tanh activations suffer from small gradients ๐Ÿ˜ข
  • Normalization keeps activations in a good range ๐Ÿ“Š

โœ… Allows Higher Learning Rates

  • Networks can train faster without getting unstable โฉ

โœ… Reduces Need for Dropout

  • Some studies show Batch Norm can replace Dropout ๐Ÿคฏ

๐ŸŽ‰ Great job! Now, letโ€™s try batch normalization in our own models! ๐Ÿ—๏ธ๐Ÿ“ˆ