nousr
/

conditioned-prior

Model card Files Files and versions Community

nousr commited on Jul 9, 2022

Commit

446e9d5

1 Parent(s): f3bcd49

shorten readme and link to longer one

Browse files

Files changed (1) hide show

README.md +6 -43

README.md CHANGED Viewed

@@ -4,16 +4,16 @@ license: mit
 # A Text-Conditioned Diffusion-Prior
 ## Training Details
-Training details can be found [here](https://wandb.ai/nousr_laion/1B%20Prior/reports/Distributed-Training-of-the-Prior--VmlldzoyMDkxMDQ5?accessToken=md54qpjikfxhf366iv64rxv94d47z05iojh28335fz6qlov11vlq313z63z42h3m)
 ## Source Code
-Models are diffusion trainers from https://github.com/lucidrains/DALLE2-pytorch
 ## Community: LAION
 Join Us!: https://discord.gg/uPMftTmrvS
 ---
-# Diffusion Prior
-This readme serves as an introduction to the diffusion prior.
 ## Intro
@@ -155,43 +155,6 @@ Training the prior is a relatively straightforward process thanks to the Trainer
 To train the prior, it is highly recommended to use precomputed embeddings for the images. To obtain these for a custom dataset, you can leverage [img2datset](https://github.com/rom1504/img2dataset) to pull images from a list of URLs and [clip_retrieval](https://github.com/rom1504/clip-retrieval#clip-inference) for generating the actual embeddings that can be used in the prior's dataloader.
-## Configuration
-The configuration file allows for you to easily track and reproduce experiments. It is a simple JSON file that will specify the architecture, dataset, and training parameters. For more information and specifics please see the configuration README.
-## Distributed Training
-If you would like to train in a distributed manner we have opted to leverage huggingface’ new Accelerate library. HFA makes it extremely simple to distribute work across multiple GPU’s and nodes. All that is required of you is to follow the simple CLI configuration tool [more information here](https://huggingface.co/docs/accelerate/accelerator).
-## Evaluation
-There are a variety of metrics available to you when training the prior. You can read a brief description of each in the table below:
-| Metric                              | Description                                                                                                                                                                                                                                                  | Comments                                                                                                                                                                                                                                                                                                                                                |
-| ----------------------------------- | ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------ | ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
-| Online Model Validation             | The validation loss associated with your online model.                                                                                                                                                                                                       | Ideally validation loss will be as low as possible. Using L2 loss, values as low as `0.1` and lower are possible after around 1 Billion samples seen.                                                                                                                                                                                                |
-| EMA Validation                      | This metric measures the validation loss associated with your EMA model.                                                                                                                                                                                     | This will likely lag behind your "online" model's validation loss, but should outperform in the long-term.                                                                                                                                                                                                                                              |
-| Baseline Similarity                 | Baseline similarity refers to the similarity between your dataset's prompts and associated image embeddings. This will serve as a guide for your prior's performance in cosine similarity.                                                                    | Generally `0.3` is considered a good cosine similarity for caption similarity.                                                                                                                                                                                                                                                                         |
-| Similarity With Original Image      | This metric will measure the cosine similarity between your prior's predicted image embedding and the actual image that the caption was associated with. This is useful for determining wether your prior is generating images with the right contents.      | Values around `0.75`+ are obtainable. This metric should improve rapidly in the early stages of training and plateau with diminishing increases over time. If it takes hundreds of millions of samples to reach above `0.5`/`0.6` similarity--then you likely are suffering from some kind of training error or inefficiency (i.e. not using EMA) |
-| Difference From Baseline Similarity | Sometimes its useful to visualize a metric in another light. This metric will show you how your prior's predicted image embeddings match up with the baseline similarity measured in your dataset.                                                           | This value should float around `0.0` with some room for variation. After a billion samples seen, values are within `0.01`+/- of `0.0`. If this climbs to high, (~>`0.02`) then this may be a sign that your model is overfitting somehow.                                                                                                       |
-| Similarity With Text                | This metric is your bread and butter cosine similarity between the predicted image embedding and the original caption given to the prior. Monitoring this metric will be on of your main focuses and is probably the second most important behind your loss. | As mentioned, this value should be close to baseline similarity. We have observed early rapid increase with diminishing returns as the prior learns to generate valid image embeddings. If this value increases too far beyond the baseline similarity--it could be an indication that your model is overfitting.                                       |
-| Similarity With Unrelated Caption   | This metric will attempt to exposed an overfit prior by feeding it arbitrary prompts (from your dataset) and then measure the similarity of this predicted embedding with some other image.                                                                   | Early on we found that a poorly trained/modeled prior could effectively fool CLIP into believing that the cosine similarity between two images were high (when in fact the caption and image were completely unrelated). With this in mind--a low value is ideal, anything below `0.1` is probably safe.                                              |
-## Launching the script
-Now that you’ve done all the prep it’s time for the easy part! 🚀
-To actually launch the script, you will either use `accelerate launch train_diffusion_prior.py --config_path <path to your config>` to launch with distributed training & huggingface accelerate or `python train_diffusion_prior.py` if you would like to train on your gpu/cpu without huggingface accelerate.
-## Checkpointing
-Checkpoints will be saved to the directory specified in your configuration file.
-Additionally, a final checkpoint is saved before running the test split. This file will be saved to the same directory and titled “latest.pth”. This is to avoid problems where your `save_every` configuration does not overlap with the number of steps required to do a complete pass through the data.
-## Things To Keep In Mind
-The prior has not been trained for tasks other than the traditional CLIP embedding translation…at least yet.
-As we finalize the replication of unCLIP, there will almost assuredly be experiments attempting to apply the prior network to other tasks.
-With that in mind, you are more or less a pioneer in embedding-translation if you are reading this and attempting something you don’t see documentation for!

 # A Text-Conditioned Diffusion-Prior
 ## Training Details
+[Updated Reports Coming]
 ## Source Code
+Models are diffusion prior trainers from https://github.com/lucidrains/DALLE2-pytorch
 ## Community: LAION
 Join Us!: https://discord.gg/uPMftTmrvS
 ---
 ## Intro
 To train the prior, it is highly recommended to use precomputed embeddings for the images. To obtain these for a custom dataset, you can leverage [img2datset](https://github.com/rom1504/img2dataset) to pull images from a list of URLs and [clip_retrieval](https://github.com/rom1504/clip-retrieval#clip-inference) for generating the actual embeddings that can be used in the prior's dataloader.
+# Looking for more info?
+This readme continues in the official DALLE2-pytorch repo! you can find more details on training, metrics, and more [here](https://github.com/lucidrains/DALLE2-pytorch/blob/main/prior.md)