You could convert the text embedding to image embeddings using the Karlo prior for better response to text

#4
by beyondarmonia - opened

"The model was not trained using text and can not interpret complex text prompts"

Converting the text to image embedding using Karlo ( Open Source Dall-E ) first might be a good solution to the above problem.

Yes that would be great to do! I once tried the image variation model with LAION trained dalle2 prior and it definitely helped, but my impression is the Karlo prior is much better.

this is unfair
this is unfair.png

Sign up or log in to comment