The model uses a U-Net with identical input and output dimensions. It progressively downsamples and upsamples its input image, adding skip connections between layers having the same resolution. The architecture is a simplified version of the architecture of DDPM. It consists of convolutional residual blocks and lacks attention layers. The network takes two inputs, the noisy images and the variances of their noise components, which it encodes using sinusoidal embeddings.
The model is intended for educational purposes, as a simple example of denoising diffusion generative models. It has modest compute requirements with reasonable natural image generation performance.
The model is trained on the Oxford Flowers 102 dataset for generating images, which is a diverse natural dataset containing around 8,000 images of flowers. Since the official splits are imbalanced (most of the images are contained in the test splite), new random splits were created (80% train, 20% validation) for training the model. Center crops were used for preprocessing.
The model is trained to denoise noisy images, and can generate images by iteratively denoising pure Gaussian noise.
|dataset repetitions per epoch||5|
|min signal rate||0.02|
|max signal rate||0.95|
|embedding max frequency||1000.0|
|block widths||32, 64, 96, 128|
|exponential moving average||0.999|
View model plot
- Downloads last month