Proposal for collaboration

by ericlearner - opened Jul 4, 2023

Jul 4, 2023

I am Eric Maeng, Co-founder at Team learners.

We sell AI-generated studio photos to hundreds of users a day . Customers love the photo-realistic quality and the fact that we're the only service of its kind to offer such a wide variety of concepts.

We're experimenting with making photos that more closely resemble our customers and avoiding deformed hands/eyes. Lately, we've been thinking a lot about model training and the data required to capture and implement features such as face shape and eye shape.

I'd love to discuss our challenges with you and talk about synergies and ways we can work together, including providing technical advice/data.

Let me know if you are interested via email.
My contact is eric@learners.company or +821020866510.

Regards,
Eric

ke99L

Jul 6, 2023

•

edited Jul 6, 2023

hey, Eric and SmilingWolf

to Eric

hands/eyes is focused object, nowdays those details easy solve in webui plugins or holistic mixed model
another way is XL experiment refinement model, look actually give so much boost

but yet lots issue use text prompting only
current architectures

mode collapsed seem baked into model, un-common object structure still low quality even effects tuning
from HPSv2 tell us, current lora mix practices just higher fidelity its really not much gain compare dallemini/vqganclip
we still need study refer GANs solved OOD and long-tails issue
actual XL thinking just applies basic patching sampling trick found in vqganclip (augmentations) https://arxiv.org/pdf/2306.16805.pdf
feels locked overall visual (composition) / nowday tough became loras addicted
not like llms, we cant get capacity leap from unet or unet lora mix ( maybe we can gain from text encoder lora
affected by Resolution batch size more factor made much Inaccessible for holistic, also most paper used full-model fine-tuning relatively cost for holistic practices
inconsistent artstyle distribution / if no multiple lora over expressed you'll get very unstable sampling
latent allocation thinking Dr. Franken Stein, and maybe better apply forget model before tuning (I saw some practices)?
easy Negative huge needed, wired prompting template (PEZ)
I saw linearly factored text embedding maybe solve this https://arxiv.org/pdf/2302.14383.pdf
prompting robust, automated reweight bias tuning(MJ shorten)
https://arxiv.org/pdf/2306.16805.pdf, https://arxiv.org/pdf/2305.16934.pdf

real ai community need

better FID/CLIPscore scoring like DreamSim, Automated evaluate holistic for text2im like https://github.com/aigoopy/llm-jeopardy
parti benchmark https://huggingface.co/spaces/OpenGenAI/parti-prompts-leaderboard
HPSv2, Benchmarking-Awesome-Diffusion-Models only include few holistic model
and they are not automate prompting https://arxiv.org/pdf/2304.05390.pdf or a probe can traverses https://arxiv.org/pdf/2306.08687.pdf
generated or prompting analysis / AI-generated detection
civitai-337k https://huggingface.co/datasets/thefcraft/civitai-stable-diffusion-337k
SDv1.3 vs MJv3 user groups Analysis https://arxiv.org/pdf/2303.04587.pdf
Insight aiart tweet https://arxiv.org/pdf/2306.08310.pdf

holistic practice leads most everythings used is untraceable solutions, we still dont know What kind of dataset introduced and how they solve

take example multiple character blended issue solved by introduced "anime couple" lora
just see SD team how hard get any mitigation progress in XL pretraining
community trend private tires, practices and founds ( version control needed https://github.com/r-three/git-theta

they data lack on stylization domain, and we will be still use this tagger in XL tuning, we will meet really fatal tuning issue
we saw XL didnt handle stylization well, they cant get mitigation anime images / use outdate BLIP2
they are restricted to CC crawl (datacomp) junks We can't get any more stylization understanding or content diversity competition with waifulabs/MJ

the current architectures seem a bit swamped in terms of controllability for artists. If one is just looking at generating generic photorealistic images (which of/c is an impressive feat) it is fine, but for highly stylized stuff the larger the model less control artists have

take example these task that can use ft to reach g4 performance (orca) while others require understanding (textbook)

pretraining need on rational-diversity controlled, small model reflect fact (tinystories/textbook highlight)

to SmilingWolf

It been a while this "foundation model" out even its a classifier
I saw its became foundation of webui system. 14k download per mo
data clean, finetuning suit, prompting idea etc...

here some future research

LLM logprob better than classifier
Figure 5 https://arxiv.org/pdf/2305.01278.pdf

Personalizing Vison-Language Models needed
deterministic https://arxiv.org/pdf/2204.01694.pdf
probabilistic https://arxiv.org/pdf/2307.00398.pdf

Personalizing Vison-LLMs needed, Prompt Inversion/ (actually captioner/tagger upstream)
https://arxiv.org/pdf/2307.00716.pdf
JourneyDB cluster the style into a hierarchical structure / summarize the 150, 000 style words / 70, 521 fine-grained styles into 334 style categories ( per-category reach 41% in their Validation(OOD set)
In this way we can skip collecting from Internet to improved clip-interrogator

combine refinement model idea,
looks really need style controlled trained base with increasing resolution cascaded trained refiner (save compute)

ericlearner

Jul 6, 2023

This comment has been hidden

nevermindddd

Jul 6, 2023

turing test falied

ericlearner

Jul 6, 2023

Lets keep the discussion focused on personalization of generative models.
Why dont we hop in for a call?

ke99L

Jul 6, 2023

•

edited Jul 8, 2023

Lets keep the discussion focused on personalization of generative models.
Why dont we hop in for a call?

nope, this discussion under the tagger
actual I'm not interested Interpret "Independent Developers" high fidelity recipe, but natural language friendly text2im like MJ
since there have been countless holistic practices... and weird naming (( why not just post issue under webui or trainer?
everyone mostly used is mainly handcraft mitigation using few shot (SAM made this easy) attempt (svd lora, diff lora merged)

little anxious, Nowadays community also a bit "dead", make hard assemble knowledge also Impossible to produce next generation models

when vqganclip we "using complex gradient to drawing (channel offset noise been used idea in vqganclip)", and now "picking lora crayon to drawing", whats next?

based on a mode collapse model, did they really get any progress?

mode collapse resulting (I think) almost half of the latent space was dead
If they havent found the diff lora merged (local mode fix) & negative embedding (reject hole) in practices, doing tuning will be very challenging

(about 1 hours he got XL), in 1 random test I was able to train 25 styles into 1 xl lora. ( gosh, whole pixiv can baked into XL ....

Im care "experts lora" no more merged, care take all lora on only one XL base (the official), SD do exact opposite way to MJ

object mode collapse happen on every training practices. is architectures fault

mode collapse wont happen in base step, is not Stability fault, this step common is large-scale, long-tail mixtures.

but will severe happen on full-model baking step, can mitigation by domain data, ood data

recall : we made mixes and lora hell using target few shot, regularization

same as palavra not only focused object

next gen CLIP will be https://arxiv.org/pdf/2304.05523.pdf we can use apply unCLIP (hyb emb) to nextgen text2im

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment