dalle-mini/dalle-mini · Some Feedback

May 28, 2022

Hi, I have been playing around with dalle mini and it's amazing!

Some things I have found that might be of interest:

When promting with in the shape of/shaped like it is quite hit and miss for me, for example I was trying out 'A Supernova in the shape of a Lion' and tried rephrasing it a few times changing the contexts (from Supernova to Galaxy or Lion to Rabbit) but it didn't produce what I expected.

Trying the phrase 'The Mona Lisa in the style of Japanese Edo era art' or 'The Mona Lisa in the style of Hokusai' produced images that were very similar to the original Mona Lisa with little elements of Japanese Edo era art.

And inputting 'A Crwth' which is a stringed intrument comes up with pictures of fields (maybe reinterpretting it as Croth a commune in france). But 'A Crwth instrument' works fine, so likely an NLP thing.

Exciting technology! I might have a go at implementing my own model one day (:

emporas

May 28, 2022

The command "in the shape of" it is very rare that it works indeed. The command "in the style of" it does work, although not as good as the better commercial models like Dalle2. The best way to find the way to prompt the engine is by looking at examples on instagram. I published some myself.

However the best way to command/prompt the engine, is by using the declarative style of programming. Take a look at CSS programming, or Datalog, Prolog etc. It is my impression that commands using natural language are not so accurate, even at the most capable models like Dalle2, and declarative prompting is the way to go. I do expect natural language prompting to fall out of fashion in the future.

khonsu

May 31, 2022

There's definitely some kind of 'prompting language' which is a lot more effective than natural language.
Prompts seem to work much better if you break up long natural language sentences into short 'blocks' of 2 or 3 words each to express key ideas. It can understand long sentences to some degree, but it's just less effective.
There also seem to be certain keywords that drastically alter picture quality, it seems that you can redirect it's 'attention' to specify which elements you want to focus on the most.

emporas

May 31, 2022

Do you have experience khonsu in better models? Dalle2 seems a lot better acquainted to natural language than mini dalle, which is a little primitive. However from my experience of studying a lot of programming languages, a lot of them, like... all of them except the matrix or array languages, is that the machine cannot be commanded in natural human language. I am greek also, and i know ancient greek too, from my knowledge, machines to be able to receive commands naturally is simply absurd.

The language we give commands is simply a markup language, or a description language, or a context free language, because we lack the ability to insert loops. CSS, or HTML are exactly that. As long as we cannot declare a scope for the variables, then the machine loses track of which variables belong to which structure, so we are limited to putting them close together and hoping the machine manages to find the association by sheer luck. In the Greek language, or ancient Greek, or Sanskrit that problem can be alleviated but not solved.

emporas

May 31, 2022

Do you have experience khonsu in better models? Dalle2 seems a lot better acquainted to natural language than mini dalle, which is a little primitive. However from my experience of studying a lot of programming languages, a lot of them, like... all of them except the matrix or array languages, is that the machine cannot be commanded in natural human language. I am greek also, and i know ancient greek too, from my knowledge, machines to be able to receive commands naturally is simply absurd.

The language we give commands is simply a markup language, or a description language, or a context free language, because we lack the ability to insert loops. CSS, or HTML are exactly that. As long as we cannot declare a scope for the variables, then the machine loses track of which variables belong to which structure, so we are limited to putting them close together and hoping the machine manages to find the association by sheer luck. In the Greek language, or ancient Greek, or Sanskrit that problem can be alleviated but not solved.

khonsu

Jun 1, 2022

I haven't used DALL-E2 yet, I've tried some Colabs and some other free apps.
Yes, natural language isn't the native representation, so we have to use natural language to try to simulate the underlying representations to get best outputs.

khonsu

Jun 1, 2022

I must say, I think there's something Dall is doing that other free text-image apps aren't doing. Although the output in the mini is mediocre, there's still a little bit of intelligence in terms of the artistic ideas that isn't there in other apps I tried. So it does seem to have a bit of understanding of even abstract concepts , even if it's only very rudimentary.

emporas

Jun 1, 2022

Yeah Dall has some really good artistic ideas, even though the output of miniDalle is not the best. However for the machine to produce even tolerable artistic ideas even 5 years ago, it was science fiction. The language model of dalle is very well trained compared to other models on the internet, the Bert model if i am not mistaken that miniDalle uses.
BTW there is a discussion of inputs for Dalle on HN and on twitter if you are interested.

https://news.ycombinator.com/item?id=31573282#31574303