TODO LIST

#376
by jbilcke-hf HF staff - opened

Hello,

here are some possible future features and fixes.

Note that this is not an official roadmap and it is not 100% sure that all of those things will be implemented!

it is mostly like an idea box I will refer to it when I have the time to work on the project, so I can pick-up things.

  • more layouts
  • more panels per layout
  • separate layouts for mono-page and bi-page
  • support for custom LoRA models (eg. for character consistency)
  • saving a comic project
  • sharing a comic project
  • more pages (eg. a button to "follow-up" on a story)
  • actual text bubbles (probably not gonna implement it soon, rather than hacking a solution with classic text injection like competing comic editors, a new AI model would be 10000x better, since it would draw the bubble with correct size, placement etc)
  • non-square layout (probably not gonna implement it soon, a model trained on comics would do a better job without hacks)

Greatest project on Huggingface!!! ๐Ÿ˜๐Ÿคฉ๐Ÿคฏ๐Ÿ‘๐Ÿ˜๐Ÿ‘๐Ÿป
Some suggestions:

  • Focus on one type of comic at time. For example Marvel, DC or Manga. Perfect the training method. When you perfected the training workflow, you can use it to quickly generate more comic styles.
  • Let the AI do more work for you. Page framing, number of pages, number of vignettes, position and aspect ratio of vignettes in each page, speech bubbles positioning, etc. should all be left to the AI to decide. Let the AI learn that from the best authors. Just train taking two or four existing comic pages each time, then write a description of the story from those pages in a very detailed way, without caring about the vignette frames. You can occasionally use cinematic expressions ("we see an estabilishing shot, new york city, midday, clear sky", "we see his figure shot from a lower angle, showing his intimidating size", "we see a close up of his cold eyes, then a cut to a close up of his hand reaching down for his pistol", "we can only see his dark figure against the blinding car lights", "the scene begins with a grandangle shot of the Paris concert hall from above", etc. please note the use of 'we see' as a trigger verb preceding any cinematic description), but only as part of the narrative, without specific references to vignette frames and order. You can mention specific words said by the characters if they are in the comic, just to let the SDXL model learn to get the right space and positioning of the speech bubbles (or captions) in the next training step. Text would be garbage anyway, and should be edited after. But teaching the AI to get the speech bubble spaces right is paramount. Then write a very detailed description prompt for each single vignette, reporting pov, shot type, lens, colors, style, and also exact words in the bubbles and stating the exact aspect ratio of the output image for each one of them based on the original vignette frame aspect ratio. The set of vignette descriptions would be the ground truth for the LLM to train on. At each story prompt should correspond 8-24 vignettes max. The length in pages should also be inferred by the LLM, not set by the user. Different tension and pacing in scenes require a different number of vignettes, the LLM should learn that from your samples.
  • For the SDXL/Lora training, the 2-4 comic pages scans will be the ground truth, while the description of each vignette generated by the LLM would be the prompt. Let the AI learn by itself where to put speech bubbles or captions.
  • You need to train from those elements with one thousand samples at minumum. I estimate that you'll need to summarize one thousand stories from 2-4 comic pages to get this right. If you find it hard, you can start with Peanuts or Garfield strips, that are simpler and then a 100-150 samples should be enough. You can also ask the help of the community to write the vignettes descriptions and the comic summaries for you. Just setup a web form module to input the descriptions for each vignette, and a button to send it to you for review. People will be eager to help. I can write a dozen of those every day in my spare time. Some common predefined setup phrases should be available from a menu to add (for example: shot type, camera angle, lens, lighting, color palette, drawing style and trait, etc.) so that they are consistent between users and easy to learn for the AI. Vignette splitting and detecting the aspect ratio of each vignette should be automated and not left to the users. Of course, since not all vignettes are rectangles, detecting their shape would be sometimes hard, and a trained AI should do that in the future. But for now try to choose only comic pages with rectangular vignettes.
  • The global style idea is good in theory, but it should be an additional string made in a modular way with very abstract predefined options. Otherwise it would be hard to stay consistent between the LLM and the SDXL, and there will be a risk of contradicting the vignette description generated, since in all comics many different styles can be seen. It should be more focused on the names of authors with a strong style, or on popular comics to imitate (i.e. Sin City, Evangelion, Incal, Spiderman, etc) than on specific elements. Even a general term as 'dark' can collide with the description of a bright scene in a gothic style comic.

Looking forward to see this amazing project taking over the world! ๐Ÿ˜Š๐Ÿ‘

This will perhaps seem a trivial request due to the fact we can do this in post processing but for your consideration...

It would be useful to have an option to flip an image from left to right.
Rationale: The flow of the comic page often relies on the orientation/direction of characters and where they are looking or moving toward. The generated panel might be near perfect with the exception of this directional cue.

Workarounds: We can save individual frames out and then overlay them onto the generated page once flipped in the software of choice.
That or just perform the flip on the outputted page itself.

Anticipated difficulties: While it would be sufficient for most panels to have a left/right mirroring option it can be anticipated that some users might desire vertical flipping as well.
As such a long term solution might be to have a toolbar at the bottom of each panel with available options that include the current options of Redraw, Edit as well as any future additions such as flipping in both horizontal and vertical directions.

(Note: I can't see a lot of cases where vertical flipping would be desired but in cases where the panel is more design oriented that may be more likely)

Here's something of an example where panels 1 and 3 are flipped horizontally to direct the flow of the reader to the next panel and (at least in theory) more appropriately through the story:

image.png

Top is original output. Bottom panels 1 and 3 flipped manually.
In thinking about it more it seems that panels 2 and 4 could be flipped as well but aren't as pressing a need as the other two panels.

P.S. Loving this latest update with Panel Edit capability!

Couldn't resist flipping panels 2 and 4.
Adjusted brightness and contrast a little as well.

image.png

Sign up or log in to comment