metadata

license: apache-2.0
tags:
  - Kandinsky
  - text-image
  - text2image
  - diffusion
  - latent diffusion
  - mCLIP-XLMR
  - mT5

Kandinsky 2.0

Kandinsky 2.0 — the first multilingual text2image model.

UNet size: 1.2B parameters

It is a latent diffusion model with two multi-lingual text encoders:

mCLIP-XLMR (560M parameters)
mT5-encoder-small (146M parameters)

These encoders and multilingual training datasets unveil the real multilingual text2image generation experience!

How to use

pip install "git+https://github.com/ai-forever/Kandinsky-2.0.git"

from kandinsky2 import get_kandinsky2
model = get_kandinsky2('cuda', task_type='text2img')
images = model.generate_text2img('кошка в космосе', batch_size=4, h=512, w=512, num_steps=75, denoised_type='dynamic_threshold', dynamic_threshold_v=99.5, sampler='ddim_sampler', ddim_eta=0.01, guidance_scale=10)

Authors

Arseniy Shakhmatov: Github, Blog
Anton Razzhigaev: Github, Blog
Aleksandr Nikolich: Github, Blog
Vladimir Arkhipkin: Github
Igor Pavlov: Github
Andrey Kuznetsov: Github
Denis Dimitrov: Github