Story Image Variety With One Page Instead of Two

#597
by bbox1136 - opened

There is an interesting dynamic that is happening with having one page only. The variety in the generation is vastly diminished not even you change the panel layout -- sometimes it will generate the same panel image on different layouts.
If you iterate that without changing the prompt, it creates very similar or identical images from clicking Go again.
When it was two pages, the variety was increased not just because of double the panels but the prompts on the "edit" portion were more fleshed out.
I think this is because the "story" is being condensed to one page; thus, the image variety is constrained to fit the entire story generated, from beginning to end of prompt, into one page.
Is there a way to change it back to two pages when cloning the project?

Thanks

bbox1136 changed discussion title from One Page panels to Story Variety With One Page Instead of Two
bbox1136 changed discussion title from Story Variety With One Page Instead of Two to Story Image Variety With One Page Instead of Two

Please return two pages comics

There is an interesting dynamic that is happening with having one page only. The variety in the generation is vastly diminished not even you change the panel layout -- sometimes it will generate the same panel image on different layouts.
If you iterate that without changing the prompt, it creates very similar or identical images from clicking Go again.

That's right, there is a cache on the language model which writes the story (to be precise, it is the Hugging Face Inference API which has this cache system by default, I didn't add it myself, but I left it enabled by default as it is convenient to reduce the usage of the platform, when people only want to change the style slightly)

the cache can be invalidated by touching the prompt with extra punctuation for instance (or by using a radically different style eg. photo versus manga)

When it was two pages, the variety was increased not just because of double the panels but the prompts on the "edit" portion were more fleshed out.
I think this is because the "story" is being condensed to one page; thus, the image variety is constrained to fit the entire story generated, from beginning to end of prompt, into one page.

This is indeed how it works, the story is built in a progressive, episodic way, 2 panels at a time, so the LLM has to be made aware of the total expected number of panels to work properly (this is still not an exact science and sometimes it can fail, since the AI Comic Factory uses by default a tiny LLM, zephyr-7b-beta, which is not as smart as say GPT-4)

Is there a way to change it back to two pages when cloning the project?

UPDATE: it's now much easier, see: https://huggingface.co/spaces/jbilcke-hf/ai-comic-factory/discussions/597#65e5f6aaf09dfaab9b20ff5a

You can even try to add more pages, although I'm not sure what will happen in term of UI design I haven't tried myself, so the layout may break a bit or look weird. It also won't get the full previous story as it generates more panels, as the comic factory is designed for a small LLM (with a tiny context window). I would suggest to also maybe try to use OpenAI for perhaps more interesting stories, but still I think my code assume the context window is small, and it won't adapt automatically to models supporting 8k, 16, 32k, 100k etc.. tokens

For the long term (I only disabled the 2nd panel as a temporary measure), I think it should be possible to restore the 2nd page, while keeping the platform within its hardware capacity to serve all the requests, by making it a deliberate and manual action (a generate button).

I think manual action is a fair way to solve the problem (eg. avoid prolongating a story if the user doesn't like it in the first place), similar to the video generation on RunwayML when you want to continue a video and make it longer.

The only issue is the one you pointed out: the language model needs to be made aware of the expected final duration, to avoid weird stories (either incomplete, or wrapping up too soon), so maybe add an extra option menu to indicate the expected number of pages, which could be arbitrary large (with some changes in the code, to provide the full history to the LLM)

Update:

You can now control the number of pages using this environment variable:

NEXT_PUBLIC_MAX_NB_PAGES="1"

or

NEXT_PUBLIC_MAX_NB_PAGES="2"

Update: You can now control the number of pages using this environment variable: NEXT_PUBLIC_MAX_NB_PAGES="1" or NEXT_PUBLIC_MAX_NB_PAGES="2"

I might be a bit off of how to properly use the variables as when I try to input this variable and after it's done building, I'm getting:
"

  • info Loaded env from /app/.env
    Listening on port 3000 url: http://r-bbox1136-ai-comic-factory-ac587q8k-4cced-1qw0u:3000
  • info Loaded env from /app/.env
    Failed to update prerender cache for 724a8ce3b09d6703384b7aeb81636e19cba15e1215b4d0394461f28c19c98a06 [Error: EACCES: permission denied, mkdir '/app/.next/cache/fetch-cache'] {
    errno: -13,
    code: 'EACCES',
    syscall: 'mkdir',
    path: '/app/.next/cache/fetch-cache'
    }
    Failed to update prerender cache for 4f8a5014a00b6c2afa9362ef2e99cedb5c1c7d299c5eade52662ae1da6a5803c [Error: EACCES: permission denied, mkdir '/app/.next/cache/fetch-cache'] {
    errno: -13,
    code: 'EACCES',
    syscall: 'mkdir',
    path: '/app/.next/cache/fetch-cache'
    }

"
But after doing that, it does still at least generate the one page.

I agree with you on the OpenAI. I have been training a model on Chat to condense panel written for a graphic novel down to an "idea" that comes out over 2 pages.
ChatGPT produces about 3-4 sentences each for the story and the character prompt part which was working well. I think it's mostly struggling with having it down to 2 panels per each story portion than if it were 4 unique parts for each panel but that's likely to introduce bizarre story discontinuity imo.

I agree with you on the button option. A lot of the times, I will generate purely to ensure that Chat is giving off useful prompt inputs (sometimes Chat is too "literary" and uses too many generalities/synonyms and then only after that need to go for 2 pages. It sounds like a great idea to cut down on the load.

What I find interesting for your application is that it's actually a lot better for making comics with character continuity than can even be found on paid sites, even if the dialogue can be a bit bland.
That's easy enough to alter and supplement in PS. The character poses and variety of content it actually gets right even for other apps using StableDiff is pretty interesting.

I think forcing it to make a story along with the images as an output does leaps and bounds for continuity in settings, objects, characters than most apps that can handle a more detailed input story but only give image outputs. That is, a lot of times it will give totally ideal imaging (minus oddities like a 6 fingered hand here and there that can be PSed away) for story purposes even if the dialogue needs simply written over in PS. The process of it writing the story and dialogue for itself leads to better outputs in imaging than most apps.

Please keep up the great work. This app is awesome.

hey where should i put the code to generate another page?

sir bbox can you i ask for help?

Sign up or log in to comment