@isidentical on Hugging Face: "What is the current SOTA in terms of fast personalized image generation? Most…"

Join the conversation

Join the community of Machine Learners and AI enthusiasts.

isidentical

posted an update Jan 16

Post

What is the current SOTA in terms of fast personalized image generation? Most of the techniques that produce great results (which is hard to objectively measure, but subject similarity index being close to 80-90%) take either too much time (full on DreamBooth fine-tuning the base model) or or loose on the auxilary properties (high rank LoRAs).

We have been also testing face embeddings, but even with multiple samples the quality is not anywhere close to what we expect. Even on the techniques that work, high quality (studio-level) pictures seem to be a must so another avenue that I'm curious is whether there is filter/segmentation of the input samples in an automatic way that people have looked in the past?

merve

Jan 16

I think for faces it's this one: https://huggingface.co/spaces/multimodalart/Ip-Adapter-FaceID it's few shot and it works very well

julien-c

Jan 17

gonna do a collection of merve avatars 🔥

samusenps

Jan 17

•

edited Jan 17

The current SOTA 💎 techniques in terms of image generation for photorealistic generation is mixing mixed/merged
LORA, LyCORIS, and Embeddings models
with a current photorealistic merged model
such as Serenity v2.0 by @malcolmrey a merge of 45 models. https://civitai.com/models/110426?modelVersionId=248599 .
Serenity a underrated model merge of 18 photorealistic models is also on the hub right now.
https://huggingface.co/malcolmrey/serenity (3likes at time of posting)
Here is a article with more details on Serenity and v2.0
https://civitai.com/articles/3198/new-version-of-my-base-model-serenity-v2-for-sd-15

When you're training a personalized model(s) keep in mind that by mixing certain weights you could add or lose wrinkles/pores depending on your selection/needs.
Making someone appear younger or older add some tokens about the age but also increase the weight of the LoRA instead of the LyCORIS.

Decreasing the weights from Embeddings or increasing LoRA/LyCORIS has also found to be helpful for less realistic models.

In terms of fast personalized photorealistic image generation training merging/mixing LoRAs and Embeddings with an existing photorealistic merged model may be one of the best ways.

lets analyze some key pros/cons excluding LyCORIS from the mix.

LyCORIS may improve/alter quality in certain ways
LyCORIS tend to take up more space
adding LyCORIS to the mix may take up more time/compute.

Here is a easy way to train a LoRA: https://huggingface.co/spaces/multimodalart/lora-ease
&
Here is a LoRA Merging repo:
https://github.com/mkshing/ziplora-pytorch .
Paper ZipLoRA: https://arxiv.org/abs/2311.13600
Paper HF: https://huggingface.co/papers/2311.13600

Here is a Textual Inversion embeddings creation/merging repo: https://github.com/klimaleksus/stable-diffusion-webui-embedding-merge
( resource to learn more about embeddings: https://stable-diffusion-art.com/embedding/ )

Decent/Good quality, Extremely fast & few shot easy to use method: https://huggingface.co/spaces/multimodalart/Ip-Adapter-FaceID
as also mentioned by @merve

the method I mention should not be very time consuming and
theoretically should produce photorealistic high quality personalized images. the missing factor is automating all of this together which should be trivial. One could implement a version of this incorporating an LLM using HuggingGPT a project contributed by @tricktreat and many others, or some other method. https://huggingface.co/spaces/microsoft/HuggingGPT

This concept of model mixing/merging ♺ is one of the key trends in improving models, increasing quality in various domains.

this is an exhilarating opportunity to try mixing techniques from different domains including increasing the amount of experts in a MoE LLM & adding more modalities for fine tuning while analyzing the similar gains in performance across domains.

Here are some really helpful resources 📖
that I also use to refer in this post
shoutout again to @malcolmrey
https://civitai.com/articles/1591/sdxl-lora-training
https://civitai.com/articles/3114/textual-inversion-embedding-training-guide
https://civitai.com/articles/7/dreambooth-lycoris-lora-guide
https://civitai.com/articles/1721/improving-results-by-using-multiple-models-of-the-same-concept-turning-it-to-11
https://civitai.com/articles/3527/bringing-it-up-to-twelve-going-deep-into-quality

samusenps

Jan 17

These examples are for the SOTA techniques I Initially mention
and not the faster method by excluding LyCORIS.
I specifically chose these examples because you may see the artifacts from mixing older aged tokens with younger tokens in the forehead wrinkles, and the potential necessity of better embeddings

mattmdjaga

Jan 17

I quite like using https://github.com/s0md3v/sd-webui-roop which you can combine with SDXL and other tools compatible with SDXL, especially when dealing with multiple faces in an image. Here are some examples for a specific output style :

samusenps

Jan 17

Holy papers! you guys I am excited to share this new project that recently dropped

'InstantID' https://huggingface.co/papers/2401.07519 Arxiv: https://arxiv.org/abs/2401.07519

Here is the project page: https://instantid.github.io/
Here is the code repo (code soon to be released) : https://github.com/InstantID/InstantID

SOTA Fast Personalized Image Generation

This method preserves both face and style magnificently. A more than worthy competitor (dare I say better) to Ip-Adapter-FaceID and roop..

I am excited to see the planned code release, and potential HF Space for this project 🤗

samusenps

Jan 17

This might just be exactly what you're looking for, for the 'HF SaGA' Avatar collections 🧑‍🚀 🧞‍♂️
@merve @julien-c

malcolmrey

Jan 21

@isidentical

@samusenps already posted my flow and I encourage you to try it but I would like to comment on one thing:
"We have been also testing face embeddings, but even with multiple samples the quality is not anywhere close to what we expect."

If you want to achieve the best results then mixing models is the way to go IMHO, but I've had quite good results with only the embeddings. You can check my models and filter by embeddings to see some examples: https://civitai.com/user/malcolmrey

If you like what you see, there is an article on how exactly (all parameters) I train those embeddings :)

In this post