kandinsky-community/kandinsky-3 · Kandinsky 2.2 has better image generation quality than this one.

Dec 7, 2023

•

edited Dec 7, 2023

The Flan UL2 text encoder sometimes makes really horrible predictions for the image gemeration. For example for the prompt in kandinsky 2.2, "Ретро паровоз едет сквозь время",

while for the exact same prompt, in kandinsky 3.0, I got this

The alignment, quality everything is bad compared to 2.2. I am not sure exactly whats causing this. But there is something with the unet to clearly understand the embedding from the flan model since it is based on a new biggan architecture with attention pooling, it has some issues with some things here and there. I also check the source, what is the actual meaning of attention mask? It is set to 128 . I don't what it is but maybe the projection? it is still good at complex prompts, but the generation quality is far behind that of its preceding models. Will sebr ai fix these issues. I will post out more issues when I found one. Btw anime surrounding images look like water color painting expect for the character. But I hope if these are solved it is basically game over for midjourney. Hats off to team kandinsky!

MkJojo

Dec 7, 2023

wait is the 3.0 model finetuned on LAION 2M aesthetic dataset like the 2.2 version?

MkJojo changed discussion title from Kandinsky 2.2 has better image generation quality than this one. Text encoder has some issues. to Kandinsky 2.2 has better image generation quality than this one. Dec 7, 2023

MkJojo changed discussion status to closed Dec 10, 2023

MkJojo changed discussion status to open Dec 10, 2023