Kandinsky: an Improved Text-to-Image Synthesis with Image Prior and Latent Diffusion
Paper
•
2310.03502
•
Published
•
78
Note For the training of the gating network (Step C ), we perform 2 epochs of training on the COCO dataset and 20k iterations on the Cityscapes dataset, employing the Adam optimizer [18]. The adaptation factor β in the utility function, as discussed in Sec. 3.3, is set to 0.0005 for COCO and 0.003 for Cityscapes... Distributed training is performed using 8 A6000 GPUs. On the COCO dataset, the training time of: Step A) 280 GPU hours Step B) 17 GPU hours Step C) 7.2 GPU hours