DALL-E tips/tricks discussion

#42
by WinterXYZ - opened

I figure this could be a good place for people to share the best tips for getting DALL-E to produce the best results. I've been experimenting a bit to work out ways that images are generated and how to push for the best results.

DALL-E does a pretty good job at taking decently complicated prompts, and sometimes the longer and more focused the prompt is the better. You can describe most parts of your goal image and it can be pretty accurate.

1: I find if you use "X in the style of ARTIST/ARTWORK" or "X by ARTIST/ARTWORK" then you can make it generate images in that style. "cat in the style of starry night" can get good results.

I figure it might be good to put together a list of the best artists or pieces of art that give really good results.
I've found "X by Yoji Shinkawa" gives some amazing results (usually on people rather than objects), if anyone knows any more please share!
"X by Picasso" seems to do some prompts well, but often it generates normal Picasso paintings (to be fair, Picasso is very abstract so it might be generating perfect results every time)

2: DALL-E also seems to be very good at placing objects inside jars with "X inside of a jar"

3: There are very few instances, but using the descriptor of "detailed" can sometimes give results where more detail is focused onto an object. It doesn't work all the time, and I've experimented with a lot of prompts where I give it two versions of the same prompt, but change one (such as "cat sitting in an alleyway" and "detailed cat sitting in a detailed alleyway") and there's an odd placebo effect where the detailed prompt looks like it has more detail, but comparing it to the other version the two share the same level of detail. It would be interesting to work out what descriptors work the best, and what causes some descriptors to get ignored.

"X holding Y" and "X in front of Y" also do a pretty decent job.

4: The AI has a reasonable understanding of some complex concepts. Refractions/Transparency (get it to put a glass sphere in front of an object, and it often gives accurate results), reflections (I find it can be pretty good at correctly reflecting objects off a body of water), caustics and shadows. I'm not sure if any other complex concepts are done correctly, but from what I've found it can sometimes accurately create these effects.

5: DALL-E struggles with word generation. It seems to know the concept of what words are, so "man wearing t-shirt" will generate a plain shirt, but "man wearing t-shirt with words" will give some symbols on it. My guess is that it simply understands that letters/words are their own thing, but there's probably a lack of training on what individual letters are so it cant recreate actual text correctly.

6: DALL-E is great at most food prompts, "a cat made out of cake" generates what you expect, "electrical cake" generates a cake covered in wires/circuitry. But one food I've found DALL-E struggles to abstract is "ramen". Most prompts that wildly effect food, don't really do much if that food is ramen (there are exceptions to the ramen images)

It makes me wonder what other objects that DALL-E really struggles to do much with.

If anyone knows any other good tricks or secret techniques for getting really good results, please share!

In minecraft is pretty epic

"In the style of" and the name of an artist. I've got interesting results with Toshi Yoshida, Hiroshi Yoshida, Ivan Bilibin, Ivan Shishkin, Kuniyoshi, Ghibli, h r giger. Try to match the style with content. Styles can also be mixed.

I've noticed that DALL-E is really good at making images of people/characters in the style of toy/figurine lines that presumably have a lot of training data. My favorite one is telling it "X as a Funko Pop" since they tend to do pretty well but it also works well on other more likeable ones like "X as a Nendoroid" or "X as a Figma"; I believe it can also work from manufacturers, I've noticed "X as a NECA figure" tends to give fairly on-point results. This also works with doll lines like Barbie or Fumo plushes to a degree but I haven't experimented extensively.

Great tips - thanks! I'm only using the mini, but for sure adding "detailed 4K photorealistic" gets better results it seems.

jmmm: Rather than "in the style of", I've been using "by". Adding additional style influences separated with "and by" has been working well. How do you phrase multiple styles when using "in the style of"?

Note: when specifying an artist or style, the exact medium or format can sometimes matter a lot. For example "a painting of X by Y" can get you different results than specifying a drawing, illustration, book cover, etc. You can get even more specific, eg. "colored pencil drawing", and you can combine mediums, eg. "ink and watercolor illustration".

I generated a range of images, all with the same prompt, and changed the artist style for each one. I'm honestly really impressed by the results each one has.
https://cdn.discordapp.com/attachments/857896057177767947/984070035691671552/unknown.png

(sorry for the lack of embedding, hugging face doesn't seem to support embeds)

@winterXYZ that's amazing!

PS/ image embedding in comments coming soon, will post here when shipped

@mrb "by" and "in X style" works too. I've been mixing styles simply by "in style of X and Y". It doesn't seem to be too fuzzy about the sentence structure, as long as you remove ambiguity.

Kuniyoshi works for character designs with traditional japanese vibes, like "xenomorph in the style of kuniyoshi", since he has a large corpus of samurai prints.

Toshi Yoshida is great for landscapes, and for giving the picture a stylised but realistic comic book look.

Diego Rivera will give you fascinating images of pre-columbian civilizations and native americans.

Generally speaking dall-e doesn't seem to have an artistic taste as such, and doesn't seem to prefer "beautiful" pictures over mundane or mediocre ones. I guess most pictures out there are nothing special. But by adding an artist to the prompt it's style and composition seem improve dramatically and show occasional flashes of brilliance.

jmmm: My experience has been that when you also use the names of celebrities as if they were artists (to nudge a portrait in a specific direction), using "and" as a separator too many times starts causing additional faces to appear. "and by" avoids this problem, but seems to work better when the list of styles starts off as "by" rather than anything else. If you're only specifying 2-3 styles or people, there indeed doesn't seem to be any differeence.

Is there anything that people have found that DALL-E really struggles to create?

It doesn't seem to understand negations. (Try "statue without head")
It's struggling with modifying faces. Probably not surprising given that it can't get geometry quite right.
Cartoon characters usually come out severely deformed. Probably because it can't map characters drawn with outline + flat shading style to photos properly.

It seems to seriously struggle with faces as well, especially in real-life art styles. Search up Donald Trump for example.
Also quick thing, try including "Trail cam" in some of your searches, it's hilarious

Many negations have another form that is understood quite well (eg. headless, faceless, etc.)

Is there any way to make the pictures clear? When i screenshot them and pull them back up on my PC, they look grainy and not sharp.
dallemini_2022-6-15_11-48-28.png

Is there anything that people have found that DALL-E really struggles to create?

Poses for fictional characters. It's difficult making a character "punch", "kick" or do anything you want really. You can use "action" or "figurine" right before the action you want or "pose" after but it doesn't give great results. I have tried "dynamic poses" and "action poses" but again it just makes low quality play-doh.

It recognizes some poses like sitting, walking, hugging, kissing, holding, crossed arms, hands behind head / on hips. But anything to do with legs, rotation or torso is difficult.

Oh and perspectives! I find it very interesting for example if you use "fisheye view of" or "low angle view of" but it only seems to work with realistic depictions. It can't seem to make 2D characters deform with extreme perspective.

it seems when i add the word VOCALOID anywhere in the prompt, Hatsune Miku instantly takes over like a disease, no matter what the prompt was. for example "yuezheng longya from vocaloid" results in quite a lot of horrifying fusions of him and miku (with mostly miku), and so do Cyber Diva from VOCALOID and basically anyone else- it seems the only vocaloids it recognizes for now are the cryptonloids (miku, rin, len, luka, kaito, and meiko), and luo tianyi

@VulcanH
@WinterXYZ
Homer Simpson is entirely immutable. And his eyes are usually pure white.
Do not mess with Homer.
craiyon_153333_Mutant_shadow_Homer__Homer_is_a_monster__Homer_monster__Homer_flying__Mutant_Homer__F.png
craiyon_153320_Mutant_shadow_Homer__Homer_is_a_monster__Homer_monster__Homer_flying__Mutant_Homer__F.png

Sign up or log in to comment