File size: 1,238 Bytes

db5f4f8
 
 
8f1e1e8
 
db5f4f8
 
 
bddd2fa
 
 
 
db5f4f8
 
 
 
bddd2fa
db5f4f8
8f1e1e8
 
bddd2fa
db5f4f8
 
 
bddd2fa
8f1e1e8

# Overview

These are latent diffusion transformer models trained from scratch on 100k 256x256 images.
Checkpoint 278k-full_state_dict.pth has been trained on about 500 epochs and is well into being overfit on the 100k training images.

The two checkpoints for 300k and 395k steps were further trained on a Midjourney dataset of 600k images for 9.4 epochs (300k steps) and 50 epochs (395k steps) at a constant LR of 5e-5.
The additional training on the MJ dataset took ~8 hours on a 4090 with batch size 256.

The models are the same as in the Google Colabs below: embed_dim=512, n_layers=8, total parameters=30507328 (30M)

# Run the Models in Colab
https://colab.research.google.com/drive/10yORcKXT40DLvZSceOJ1Hi5z_p5r-bOs?usp=sharing

# Colab Training Notebook
https://colab.research.google.com/drive/1sKk0usxEF4bmdCDcNQJQNMt4l9qBOeAM?usp=sharing

# Github Repo
This repo contains the original training code:
https://github.com/apapiu/transformer_latent_diffusion

# Datasets used
https://huggingface.co/apapiu/small_ldt/tree/main

# Other
See this Reddit post by u/spring_m (huggingface.co/apapiu) for some more information:
https://www.reddit.com/r/MachineLearning/comments/198eiv1/p_small_latent_diffusion_transformer_from_scratch/