File size: 1,238 Bytes
db5f4f8 8f1e1e8 db5f4f8 bddd2fa db5f4f8 bddd2fa db5f4f8 8f1e1e8 bddd2fa db5f4f8 bddd2fa 8f1e1e8 |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 |
# Overview
These are latent diffusion transformer models trained from scratch on 100k 256x256 images.
Checkpoint 278k-full_state_dict.pth has been trained on about 500 epochs and is well into being overfit on the 100k training images.
The two checkpoints for 300k and 395k steps were further trained on a Midjourney dataset of 600k images for 9.4 epochs (300k steps) and 50 epochs (395k steps) at a constant LR of 5e-5.
The additional training on the MJ dataset took ~8 hours on a 4090 with batch size 256.
The models are the same as in the Google Colabs below: embed_dim=512, n_layers=8, total parameters=30507328 (30M)
# Run the Models in Colab
https://colab.research.google.com/drive/10yORcKXT40DLvZSceOJ1Hi5z_p5r-bOs?usp=sharing
# Colab Training Notebook
https://colab.research.google.com/drive/1sKk0usxEF4bmdCDcNQJQNMt4l9qBOeAM?usp=sharing
# Github Repo
This repo contains the original training code:
https://github.com/apapiu/transformer_latent_diffusion
# Datasets used
https://huggingface.co/apapiu/small_ldt/tree/main
# Other
See this Reddit post by u/spring_m (huggingface.co/apapiu) for some more information:
https://www.reddit.com/r/MachineLearning/comments/198eiv1/p_small_latent_diffusion_transformer_from_scratch/
|