Kandinsky 2.2 has better image generation quality than this one.

#5
by MkJojo - opened

The Flan UL2 text encoder sometimes makes really horrible predictions for the image gemeration. For example for the prompt in kandinsky 2.2, "Ретро паровоз едет сквозь время",
IMG_20231207_152249_534.jpg
while for the exact same prompt, in kandinsky 3.0, I got this
IMG_20231207_152751_981.jpg
The alignment, quality everything is bad compared to 2.2. I am not sure exactly whats causing this. But there is something with the unet to clearly understand the embedding from the flan model since it is based on a new biggan architecture with attention pooling, it has some issues with some things here and there. I also check the source, what is the actual meaning of attention mask? It is set to 128 . I don't what it is but maybe the projection? it is still good at complex prompts, but the generation quality is far behind that of its preceding models. Will sebr ai fix these issues. I will post out more issues when I found one. Btw anime surrounding images look like water color painting expect for the character. But I hope if these are solved it is basically game over for midjourney. Hats off to team kandinsky!

wait is the 3.0 model finetuned on LAION 2M aesthetic dataset like the 2.2 version?

MkJojo changed discussion title from Kandinsky 2.2 has better image generation quality than this one. Text encoder has some issues. to Kandinsky 2.2 has better image generation quality than this one.
MkJojo changed discussion status to closed
MkJojo changed discussion status to open

Sign up or log in to comment