Join the conversation

Join the community of Machine Learners and AI enthusiasts.

Sign Up
isidenticalΒ 
posted an update Jan 16
Post
What is the current SOTA in terms of fast personalized image generation? Most of the techniques that produce great results (which is hard to objectively measure, but subject similarity index being close to 80-90%) take either too much time (full on DreamBooth fine-tuning the base model) or or loose on the auxilary properties (high rank LoRAs).

We have been also testing face embeddings, but even with multiple samples the quality is not anywhere close to what we expect. Even on the techniques that work, high quality (studio-level) pictures seem to be a must so another avenue that I'm curious is whether there is filter/segmentation of the input samples in an automatic way that people have looked in the past?

I think for faces it's this one: https://huggingface.co/spaces/multimodalart/Ip-Adapter-FaceID it's few shot and it works very well
image (41).png

image (51).png

image (34).png

Β·

gonna do a collection of merve avatars πŸ”₯

The current SOTA πŸ’Ž techniques in terms of image generation for photorealistic generation is mixing mixed/merged
LORA, LyCORIS, and Embeddings models
with a current photorealistic merged model
such as Serenity v2.0 by @malcolmrey a merge of 45 models. https://civitai.com/models/110426?modelVersionId=248599 .
Serenity a underrated model merge of 18 photorealistic models is also on the hub right now.
https://huggingface.co/malcolmrey/serenity (3likes at time of posting)
Here is a article with more details on Serenity and v2.0
https://civitai.com/articles/3198/new-version-of-my-base-model-serenity-v2-for-sd-15

When you're training a personalized model(s) keep in mind that by mixing certain weights you could add or lose wrinkles/pores depending on your selection/needs.
Making someone appear younger or older add some tokens about the age but also increase the weight of the LoRA instead of the LyCORIS.

Decreasing the weights from Embeddings or increasing LoRA/LyCORIS has also found to be helpful for less realistic models.


In terms of fast personalized photorealistic image generation training merging/mixing LoRAs and Embeddings with an existing photorealistic merged model may be one of the best ways.

lets analyze some key pros/cons excluding LyCORIS from the mix.

  • LyCORIS may improve/alter quality in certain ways
  • LyCORIS tend to take up more space
  • adding LyCORIS to the mix may take up more time/compute.

Here is a easy way to train a LoRA: https://huggingface.co/spaces/multimodalart/lora-ease
&
Here is a LoRA Merging repo:
https://github.com/mkshing/ziplora-pytorch .
Paper ZipLoRA: https://arxiv.org/abs/2311.13600
Paper HF: https://huggingface.co/papers/2311.13600

Here is a Textual Inversion embeddings creation/merging repo: https://github.com/klimaleksus/stable-diffusion-webui-embedding-merge
( resource to learn more about embeddings: https://stable-diffusion-art.com/embedding/ )

Decent/Good quality, Extremely fast & few shot easy to use method: https://huggingface.co/spaces/multimodalart/Ip-Adapter-FaceID
as also mentioned by @merve

the method I mention should not be very time consuming and
theoretically should produce photorealistic high quality personalized images. the missing factor is automating all of this together which should be trivial. One could implement a version of this incorporating an LLM using HuggingGPT a project contributed by @tricktreat and many others, or some other method. https://huggingface.co/spaces/microsoft/HuggingGPT


This concept of model mixing/merging β™Ί is one of the key trends in improving models, increasing quality in various domains.

this is an exhilarating opportunity to try mixing techniques from different domains including increasing the amount of experts in a MoE LLM & adding more modalities for fine tuning while analyzing the similar gains in performance across domains.

Here are some really helpful resources πŸ“–
that I also use to refer in this post
shoutout again to @malcolmrey
https://civitai.com/articles/1591/sdxl-lora-training
https://civitai.com/articles/3114/textual-inversion-embedding-training-guide
https://civitai.com/articles/7/dreambooth-lycoris-lora-guide
https://civitai.com/articles/1721/improving-results-by-using-multiple-models-of-the-same-concept-turning-it-to-11
https://civitai.com/articles/3527/bringing-it-up-to-twelve-going-deep-into-quality

Β·

These examples are for the SOTA techniques I Initially mention
and not the faster method by excluding LyCORIS.
I specifically chose these examples because you may see the artifacts from mixing older aged tokens with younger tokens in the forehead wrinkles, and the potential necessity of better embeddings
261969-175308389-30-DPM++ 2M Karras-1408-serenity_v2.jpeg

261978-4203974099-30-DPM++ 2M Karras-1408-serenity_v2.jpeg

261979-221025586-30-DPM++ 2M Karras-1408-serenity_v2.jpeg

261992-4293141323-30-DPM++ 2M Karras-1408-serenity_v2.jpeg

I quite like using https://github.com/s0md3v/sd-webui-roop which you can combine with SDXL and other tools compatible with SDXL, especially when dealing with multiple faces in an image. Here are some examples for a specific output style :

American cartoon_asian male and asian female.png
American cartoon_indian male and indian female.png
American cartoon_lightskin male and lightskin female.png
American cartoon_white female and white female.png

Holy papers! you guys I am excited to share this new project that recently dropped

'InstantID' https://huggingface.co/papers/2401.07519 Arxiv: https://arxiv.org/abs/2401.07519

Here is the project page: https://instantid.github.io/
Here is the code repo (code soon to be released) : https://github.com/InstantID/InstantID

SOTA Fast Personalized Image Generation

This method preserves both face and style magnificently. A more than worthy competitor (dare I say better) to Ip-Adapter-FaceID and roop..

I am excited to see the planned code release, and potential HF Space for this project πŸ€—

applications.png

Β·

This might just be exactly what you're looking for, for the 'HF SaGA' Avatar collections πŸ§‘β€πŸš€ πŸ§žβ€β™‚οΈ
@merve @julien-c
HF saga.png

@isidentical

@samusenps already posted my flow and I encourage you to try it but I would like to comment on one thing:
"We have been also testing face embeddings, but even with multiple samples the quality is not anywhere close to what we expect."

If you want to achieve the best results then mixing models is the way to go IMHO, but I've had quite good results with only the embeddings. You can check my models and filter by embeddings to see some examples: https://civitai.com/user/malcolmrey

If you like what you see, there is an article on how exactly (all parameters) I train those embeddings :)