opetrova commited on
Commit
f122086
1 Parent(s): 1bcddf2

Create a model card

Browse files
Files changed (1) hide show
  1. README.md +98 -0
README.md ADDED
@@ -0,0 +1,98 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ language: en
3
+ license: mit
4
+ library_name: PyTorch
5
+ tags:
6
+ - computer vision
7
+ - GAN
8
+ datasets:
9
+ - multi-pie
10
+ ---
11
+
12
+ Face Frontalization is a generative computer vision task in which the model takes a photo of a person's head taken at an angle between -90 and 90 degrees, and produces an image of what that person's frontal (i.e. 0 degree) view of the face might look like. The present model was first released in [this repository](https://github.com/scaleway/frontalization) by [Scaleway](https://www.scaleway.com/), a European cloud provider originating from France. It has been previously discussed in a [Scaleway blog post](https://blog.scaleway.com/gpu-instances-using-deep-learning-to-obtain-frontal-rendering-of-facial-images/) and presented at [the DataXDay conference in Paris](https://www.youtube.com/watch?v=aL7rhJz8mAI). The model's GAN architecture was inspired by [the work of R. Huang et al](https://arxiv.org/abs/1704.04086).
13
+
14
+ # Model description
15
+
16
+ The Face Frontalization model is the Generator part of a [GAN](https://proceedings.neurips.cc/paper/2014/file/5ca3e9b122f61f8f06494c97b1afccf3-Paper.pdf) that was trained in a supervised fashion on profile-frontal image pairs. The Discriminator was based on a fairly standard [DCGAN](https://arxiv.org/abs/1511.06434) architecture, where the input is a 128x128x3 image that is processed through multiple convolutional layers, to be classified as either Real or Fake. The Generator had to be modified in order to fit the supervised learning scenario. It consists of convolutional layers (the Encoder of the input image), followed by a 512-dimensional hidden representation that is then fed into the Decoder made up of deconvolutional layers, which produces the output image. For more details on the model's architecture, see [this blog post](https://blog.scaleway.com/gpu-instances-using-deep-learning-to-obtain-frontal-rendering-of-facial-images/).
17
+
18
+ # Intended uses & limitations
19
+
20
+ The present Face Frontalization model was not intended to represent the state of the art for this machine learning task. Instead, the goals were:
21
+ <ol>
22
+ <li>to demonstrate the benefits of using a GAN for supervised machine learning tasks (whereas the original GAN is an unsupervised generative algorithm; see [this conference talk](https://www.youtube.com/watch?v=aL7rhJz8mAI) for more details);</li>
23
+
24
+ <li>to show how a complex generative computer vision project can be accomplished on a [Scaleway cloud RENDER-S instance](https://www.scaleway.com/en/gpu-instances/) within ~ a day.</li>
25
+ </ol>
26
+
27
+ # How to use
28
+
29
+ The Face Frontalization model is a saved Pytorch model that can be loaded provided the included *network* package is present in the directory. It takes in 3-channel color images resized to 128x128 pixels in the form of [N, 3, 128, 128] tensors (where N is the size of the batch). Ideally, the input images should be closely-cropped photos of faces, taken in good lighting conditions. Here is how the model can be used for inference with a *gradio* image widget, e.g. in a Jupyter notebook:
30
+
31
+ ```
32
+ import gradio as gr
33
+ import numpy as np
34
+
35
+ import torch
36
+ from torchvision import transforms
37
+ from torch.autograd import Variable
38
+
39
+ from PIL import Image
40
+ import matplotlib.pyplot as plt
41
+
42
+ import warnings
43
+ warnings.filterwarnings('ignore')
44
+
45
+ # Load the saved Frontalization generator model
46
+ saved_model = torch.load("./generator_v0.pt", map_location=torch.device('cpu'))
47
+
48
+ def frontalize(image):
49
+
50
+ # Convert the test image to a [1, 3, 128, 128]-shaped torch tensor
51
+ # (as required by the frontalization model)
52
+ preprocess = transforms.Compose((transforms.ToPILImage(),
53
+ transforms.Resize(size = (128, 128)),
54
+ transforms.ToTensor()))
55
+ input_tensor = torch.unsqueeze(preprocess(image), 0)
56
+
57
+ # Use the saved model to generate an output (whose values go between -1 and 1,
58
+ # and this will need to get fixed before the output is displayed)
59
+ generated_image = saved_model(Variable(input_tensor.type('torch.FloatTensor')))
60
+ generated_image = generated_image.detach().squeeze().permute(1, 2, 0).numpy()
61
+ generated_image = (generated_image + 1.0) / 2.0
62
+
63
+ return generated_image
64
+
65
+ iface = gr.Interface(frontalize, gr.inputs.Image(type="numpy"), "image")
66
+ iface.launch()
67
+ ```
68
+
69
+ # Limitations and bias
70
+
71
+ As mentioned in the **Intended uses** section, the present model's performance is not intended to compete with the state of the art. Additionally, as the training data had a disproportionately high number of images of caucasian and asian males in their 20s, the model does not perform as well when supplied with images of people not belonging to this limited demographic.
72
+
73
+ # Training data
74
+
75
+ The present model was trained on [the CMU Multi-PIE Face Database that is available commercially](https://www.cs.cmu.edu/afs/cs/project/PIE/MultiPie/Multi-Pie/Home.html). The input images were closely cropped to include the face of a person photographed at an angle between -90 and 90 degrees. The target frontal images were cropped and aligned so that the center of the person's left eye was at the same relative position in all of them. Having a precise alignment for the target images turned out to play a key role in the training of the model.
76
+
77
+ # Training procedure
78
+
79
+ The training of the model was performed in a similar manner to that of a regular unsupervised [GAN](https://proceedings.neurips.cc/paper/2014/file/5ca3e9b122f61f8f06494c97b1afccf3-Paper.pdf), except that in addition to the binary cross entropy loss for the Discriminator, a pixelwise loss function was introduced for the Generator (see [the blog post](https://blog.scaleway.com/gpu-instances-using-deep-learning-to-obtain-frontal-rendering-of-facial-images/) for details). The exact weights given to the L1 and L2 pixelwise losses, as well as the BCE (GAN) loss were as follows:
80
+
81
+ ```
82
+ L1_factor = 1
83
+ L2_factor = 1
84
+ GAN_factor = 0.001
85
+ ```
86
+ The model was trained for 18 epochs, with the training batch size equal to 30. The following optimizers were used for the Discriminator and the Generator:
87
+ ```
88
+ optimizerD = optim.Adam(netD.parameters(), lr = 0.0002, betas = (0.5, 0.999))
89
+ optimizerG = optim.Adam(netG.parameters(), lr = 0.0002, betas = (0.5, 0.999), eps = 1e-8)
90
+ ```
91
+
92
+ # Evaluation results
93
+
94
+ GANs are notoriously difficult to train, with the losses for the Discriminator and the Generator often failing to converge even when producing what looks to be a highly realistic result to a human eye. The pixelwise loss for the test images also serves as a poor indicator of the model's performance because any variation in the lighting between the real target photo and the generated image could result in a deceptively high discrepancy between the two. The best evaluation method that remains is the manual inspection of the generated results. We have found that the present model performs reasonably well on the test data from the CMU Multi-PIE Face Database (naturally, all of the photos of the individuals included in the test set were removed from training):
95
+
96
+ ![test examples](https://github.com/scaleway/frontalization/raw/master/pretrained/test-Pie.jpg)
97
+
98
+ (Top row: inputs; middle row: model outputs; bottom row: ground truth images)