metadata

tags:
  - autoencoder
  - image-colorization
  - pytorch
  - pytorch_model_hub_mixin
license: apache-2.0
datasets:
  - flwrlabs/celeba
language:
  - en
metrics:
  - mse
pipeline_tag: image-to-image

Model Colorization Autoencoder

Model Description

This autoencoder model is designed for image colorization. It takes grayscale images as input and outputs colorized versions of those images. The model architecture consists of an encoder-decoder structure, where the encoder compresses the input image into a latent representation, and the decoder reconstructs the image in color.

Architecture

Encoder: The encoder comprises three convolutional layers followed by max pooling and ReLU activations, each paired with batch normalization. It ends with a flattening layer and a fully connected layer to produce a latent vector.
Decoder: The decoder mirrors the encoder, using linear and transposed convolutional layers with ReLU activations and batch normalization. The final layer outputs a color image using a sigmoid activation function.

The architecture details are as follows:

class ModelColorization(nn.Module, PyTorchModelHubMixin):
    def __init__(self):
        super(ModelColorization, self).__init__()
        self.encoder = nn.Sequential(
            nn.Conv2d(1, 64, kernel_size=3, stride=1, padding=1),
            nn.MaxPool2d(kernel_size=2, stride=2),
            nn.ReLU(),
            nn.BatchNorm2d(64),
            nn.Conv2d(64, 32, kernel_size=3, stride=1, padding=1),
            nn.MaxPool2d(kernel_size=2, stride=2),
            nn.ReLU(),
            nn.BatchNorm2d(32),
            nn.Conv2d(32, 16, kernel_size=3, stride=1, padding=1),
            nn.MaxPool2d(kernel_size=2, stride=2),
            nn.ReLU(),
            nn.BatchNorm2d(16),
            nn.Flatten(),
            nn.Linear(16*45*45, 4000),
        )
        self.decoder = nn.Sequential(
            nn.Linear(4000, 16 * 45 * 45),
            nn.ReLU(),
            nn.Unflatten(1, (16, 45, 45)),
            nn.ConvTranspose2d(16, 32, kernel_size=3, stride=2, padding=1, output_padding=1),
            nn.ReLU(),
            nn.BatchNorm2d(32),
            nn.ConvTranspose2d(32, 64, kernel_size=3, stride=2, padding=1, output_padding=1),
            nn.ReLU(),
            nn.BatchNorm2d(64),
            nn.ConvTranspose2d(64, 3, kernel_size=3, stride=2, padding=1, output_padding=1),
            nn.Sigmoid()
        )

    def forward(self, x):
        x = self.encoder(x)
        x = self.decoder(x)
        return x

Training Details

The model was trained using PyTorch for 5 epochs. Here are the training and validation losses observed during the training:

Epoch 1: Training Loss: 0.0063, Validation Loss: 0.0042

Epoch 2: Training Loss: 0.0036, Validation Loss: 0.0035

Epoch 3: Training Loss: 0.0032, Validation Loss: 0.0032

Epoch 4: Training Loss: 0.0030, Validation Loss: 0.0030

Epoch 5: Training Loss: 0.0029, Validation Loss: 0.0030

The model demonstrated continuous improvement in reducing both training and validation loss over the epochs.

Usage

You can load the model from the Hugging Face Hub using the following code:

# Ensure you have the necessary dependencies installed:
pip install torch torchvision transformers

from transformers import AutoModel

model = AutoModel.from_pretrained("sebastiansarasti/AutoEncoderImageColorization")