[Comic] Draw Human who has hair like lord Shiva standing on the flying mountain in distant, draw an ocean below the flying mountain

#126
by realdreamchronicles - opened

Prompt:

Draw Human who has hair like lord Shiva standing on the flying mountain in distant, draw an ocean below the flying mountain

Japanese

japanese.jpg

Nihonga

nihonga.jpg

3D Render

3d-render.jpg

Armorican

armorican.jpg

Haddock

haddock.jpg

Humanoid

humanoid.jpg

Flying Saucer

flying-saucer.jpg

American 1950

american-1950.jpg

American Modern

american-modern.jpg

Franco Belgian

franco-belgian.jpg

It looks like the language model didn't add a lot of diversity between panels (it might be a bug)

But wow.. the comparison between the various styles makes it pretty interesting!

hello i think it is because of what the ' virtual brain " " know " he work with references and i think if you see what he have " learned" you can found TONS of mountain in the back made by a lot of people in photo paint comics and all look the same they are " cliché "he dont need to invent he know exactly what the prompt ask

realdreamchronicles changed discussion status to closed
realdreamchronicles changed discussion status to open

@jbilcke-hf Yes, I observed that there wasn't much diversity between the panels. While examining the console log, I stumbled upon something that might be one of the contributing factors. The original response seemed to be on point, as detailed below: (LLM response)

  1. Panel 1: A close-up of a human's face, showcasing a fierce and determined expression, with a hint of wildness in their eyes. The setting is a small rock or cliff on a floating mountain, with the sky and a few clouds forming the background. The focus is mainly on the upper body and face, with the hair resembling Lord Shiva's flowing hairstyle.

  2. Panel 2: A wide-angle shot that captures the floating mountain above a deep blue ocean with waves crashing against the shore. The mountain is depicted as a massive rocky formation floating with lush greenery on top. The human figure is visible in the distance, standing on the mountain with arms spread wide.

  3. Panel 3: A close-up of the human's hand holding a small, glowing crystal, raised towards the sky, seemingly channeling its power. The panel emphasizes the glowing crystal, surrounded by a soft luminescence, with a background gradient of dark and light blues representing the sky and ocean.

  4. Panel 4: (Unfortunately, the instructions for this panel were truncated.)

However, an error message appeared stating, "failed to read LLM response: SyntaxError: Unterminated string in JSON at position 1202", which led to the prompt being truncated to a limited number of characters.

Regrettably, the truncated prompt was: "Draw a human with hair resembling Lord Shiva's, standing on a flying mountain in di..." for all the panels

This truncation might be one of the reasons why all the panels turned out somewhat similar, lacking in variations. The intended prompts for the panels were diverse, encompassing golden, patchwork, and the style of Gustav Klimt, but were not fully conveyed due to the error.

I'm not entirely certain that this is the root cause, but it could potentially be one of the reasons for the lack of diversity between the panels.

Additionally, I encountered further issues when I attempted to add some presets. The system displayed an error message stating "prediction of the story failed. Failed to generate the story Error: empty result", and I was unable to generate any images. This even caused my browser to crash, necessitating a restart.

At one point, I wondered if this could be due to some kind of rate limit, but I'm not entirely sure.

Screenshot 2023-09-09 144727.png
Screenshot 2023-09-10 024059.png

Hi @realdreamchronicles indeed the LLM generation is not very robust, when that happens you can try again

regarding the truncation, it was caused by this code:
https://huggingface.co/spaces/jbilcke-hf/ai-comic-factory/blob/main/src/app/queries/predict.ts#L58

Before I had used a setting of 330 tokens only -> I have increased it to 450 (roughly 100~110 tokens per panel)
but this might increase the generation delay

Alright Thanks @jbilcke-hf , I will try again now. Additionally, do we need to limit the input prompts to a specific number of characters?

In general, I've found a token to be equivalent to about 3 or 4 characters although I suppose that will depend on the LLM being used.
While it might not be particularly accurate example think of the word 'elevator'... if the first token is 'ele'.... what might the next token be predicted to be?
It could be 'elephant' or 'elementary', etc.
Now given the second token we start to get some context; 'ele' plus 'lev'.
Hmmm... what word has both 'ele' and 'lev' contained in it?
Eleven? Maybe?
Let's keep adding context with our next token; 'tor'
The likelyhood of the prediction being 'elevator' is becoming high.

It may be of interest that there are some applications out there (or in here at Hugging Face) that take an image and based on the words/prompts given will assess the probability that image has of being classified under one of those words. I've used that to some degree of success to recreate some previous images/characters that I had not bothered to save the prompt used to generate them with. So I reverse engineered the prompt by finding which words where most likely to generate that same kind of character/image.

Here's an example I showed someone else to try to describe that process:

image.png

I had run a few classification terms prior to this and wanted to see how much the model would rate the image as being a 'bunny rabbit'.
I'm happy to report it assessed it as zero... it ain't no bunny rabbit. ;)

I believe the model I was testing at the time is this one:

https://huggingface.co/philschmid/clip-zero-shot-image-classification

With this kind of tool we can (at least in theory) use our generated imagery to determine how much of an influence each word/token had in generating that image.
Those that rate high... use again. Those that rate low... save for some other day. :)

The specific character and style I was trying to recall how to create was this one:

animagineXLcybergirls.0006.png

Here then after a few attempts I was able to deduce the 'basic recipe' for that style (including the intentional enlongated cybercycle and extra limbs (trying to leverage the errors of image generators there!) and added some 3D rendering touches:

mostly recreated but with 3D rendering styleA.png

There's more to say on this but that's another story.

Sign up or log in to comment