Community Computer Vision Course documentation

Generative Adversarial Networks

Hugging Face's logo
Join the Hugging Face community

and get access to the augmented documentation experience

to get started

Generative Adversarial Networks

Introduction

Generative Adversarial Networks (GANs) are a class of deep learning models introduced by Ian Goodfellow and his colleagues in 2014. The core idea behind GANs is to train a generator network to produce data that is indistinguishable from real data, while simultaneously training a discriminator network to differentiate between real and generated data.

  • Architecture overview: GANs consist of two main components: the generator and the discriminator
  • Generator: The generator takes random noisezz as input and generates synthetic data samples. Its goal is to create data that is realistic enough to deceive the discriminator.
  • Discriminator: The discriminator, akin to a detective, evaluates whether a given sample is real (from the actual dataset) or fake (generated by the generator). Its objective is to become increasingly accurate in distinguishing between real and generated samples.

A common analogy that can be found online is that of an art forger/painter (the generator) which tries to forge paintings and an art investigator/critic (the discriminator) which tries to detect limitations.

Lilian Weng GAN Figure

GANs vs VAEs

GANs and VAEs are both popular generative models in machine learning, but they have different strengths and weaknesses. Whether one is “better” depends on the specific task and requirements. Here’s a breakdown of their strengths and weaknesses.

  • Image Generation:
    • GANs:
      • Strengths: Generate higher quality images, especially for complex data with sharp details and realistic textures.
      • Weaknesses: Can be more difficult to train and prone to instability.
      • Example: A GAN-generated image of a bedroom is likely to be indistinguishable from a real one, while a VAE-generated bedroom might appear blurry or have unrealistic lighting. Example of GAN-Generated bedrooms taken from Unsupervised Representation Learning with Deep Convolutional Generative Adversarial Networks, 2015
    • VAEs:
      • Strengths: Easier to train and more stable than GANs
      • Weaknesses: May generate blurry, less detailed images with unrealistic features.
  • Other Tasks:
    • GANs:
      • Strengths: Can be used for tasks like super-resolution and image-to-image translation.
      • Weaknesses: May not be the best choice for tasks that require a smooth transition between data points
    • VAEs:
      • Strengths: Widely used for tasks like image denoising and anomaly detection.
      • Weaknesses: May not be as effective as GANs for tasks that require high-quality image generation.

Here’s a table summarizing the key differences:

Feature GANs VAEs
Image Quality Higher Lower
Ease of Training More difficult Easier
Stability Less Stable More Stable
Applications Image Generation, Super-resolution, image-to-image translation Image Denoising, Anamoly Detection, Signal Analysis

Ultimately, the best choice depends on one’s specific needs and priorities. If one needs high-quality images for tasks like generating realistic faces or landscapes, then a GAN might be the better choice. However, if one needs a model that is easier to train and more stable, then a VAE might be a better option.

Training GANs

Training GANs involves a unique adversarial process where the generator and discriminator play a cat-and-mouse game.

  • Adversarial Training Process: The generator and discriminator are trained simultaneously. The generator aims to produce data that is indistinguishable from real data, while the discriminator strives to improve its ability to differentiate between real and fake samples.
  • Objective Function: The training process is guided by a min-max game type objective function which is used to optimize both the generator and the discriminator. The generator aims to minimize the probability of the discriminator correctly classifying generated samples as fake, while the discriminator seeks to maximize this probability. This objective function is represented as:minGmaxDL(D,G)=Expr(x)[logD(x)]+Expg(x)[log(1D(x))]\min_G \max_D L(D, G)=\mathbb{E}_{x \sim p_{r}(x)} [\log D(x)] + \mathbb{E}_{x \sim p_g(x)} [\log(1 - D(x))] Here, the discriminator tries to maximize this loss function whereas the generator tries to minimize it, hence the adversarial nature.
  • Iterative Improvement: As training progresses, the generator becomes adept at producing realistic samples, and the discriminator becomes more discerning. This adversarial loop continues until the generator generates data that is virtually indistinguishable from real data.

References:

  1. Lilian Weng’s Awesome Blog on GANs
  2. GAN — What is Generative Adversarial Networks
  3. What are the fundamental differences between VAE and GAN for image generation?
  4. Issues with GAN and VAE models
  5. VAE Vs. GAN For Image Generation
  6. Diffusion Models vs. GANs vs. VAEs: Comparison of Deep Generative Models
< > Update on GitHub