Kandinsky Community
AI & ML interests
None defined yet.
Recent Activity
View all activity
kandinsky-community's activity
Post
2009
Introducing a high-quality open-preference dataset to further this line of research for image generation.
Despite being such an inseparable component for modern image generation, open preference datasets are a rarity!
So, we decided to work on one with the community!
Check it out here:
https://huggingface.co/blog/image-preferences
Despite being such an inseparable component for modern image generation, open preference datasets are a rarity!
So, we decided to work on one with the community!
Check it out here:
https://huggingface.co/blog/image-preferences
Post
2085
The Control family of Flux from
@black-forest-labs
should be discussed more!
It enables structural controls like ControlNets while being significantly less expensive to run!
So, we're working on a Control LoRA training script π€
It's still WIP, so go easy:
https://github.com/huggingface/diffusers/pull/10130
It enables structural controls like ControlNets while being significantly less expensive to run!
So, we're working on a Control LoRA training script π€
It's still WIP, so go easy:
https://github.com/huggingface/diffusers/pull/10130
sayakpaulΒ
authored
a
paper
15 days ago
Post
1465
Let 2024 be the year of video model fine-tunes!
Check it out here:
https://github.com/a-r-r-o-w/cogvideox-factory/tree/main/training/mochi-1
Check it out here:
https://github.com/a-r-r-o-w/cogvideox-factory/tree/main/training/mochi-1
Post
2589
It's been a while we shipped native quantization support in
We currently support
This post is just a reminder of what's possible:
1. Loading a model with a quantization config
2. Saving a model with quantization config
3. Loading a pre-quantized model
4.
5. Training and loading LoRAs into quantized checkpoints
Docs:
https://huggingface.co/docs/diffusers/main/en/quantization/bitsandbytes
diffusers
π§¨We currently support
bistandbytes
as the official backend but using others like torchao
is already very simple. This post is just a reminder of what's possible:
1. Loading a model with a quantization config
2. Saving a model with quantization config
3. Loading a pre-quantized model
4.
enable_model_cpu_offload()
5. Training and loading LoRAs into quantized checkpoints
Docs:
https://huggingface.co/docs/diffusers/main/en/quantization/bitsandbytes
patrickvonplatenΒ
authored
a
paper
2 months ago
Post
2752
Did some little experimentation to resize pre-trained LoRAs on Flux. I explored two themes:
* Decrease the rank of a LoRA
* Increase the rank of a LoRA
The first one is helpful in reducing memory requirements if the LoRA is of a high rank, while the second one is merely an experiment. Another implication of this study is in the unification of LoRA ranks when you would like to
Check it out here:
sayakpaul/flux-lora-resizing
* Decrease the rank of a LoRA
* Increase the rank of a LoRA
The first one is helpful in reducing memory requirements if the LoRA is of a high rank, while the second one is merely an experiment. Another implication of this study is in the unification of LoRA ranks when you would like to
torch.compile()
them. Check it out here:
sayakpaul/flux-lora-resizing
sayakpaulΒ
authored
a
paper
4 months ago
Post
2945
Here is a hackable and minimal implementation showing how to perform distributed text-to-image generation with Diffusers and Accelerate.
Full snippet is here: https://gist.github.com/sayakpaul/cfaebd221820d7b43fae638b4dfa01ba
With @JW17
Full snippet is here: https://gist.github.com/sayakpaul/cfaebd221820d7b43fae638b4dfa01ba
With @JW17
Post
4477
Flux.1-Dev like images but in fewer steps.
Merging code (very simple), inference code, merged params: sayakpaul/FLUX.1-merged
Enjoy the Monday π€
Merging code (very simple), inference code, merged params: sayakpaul/FLUX.1-merged
Enjoy the Monday π€
Post
3793
With larger and larger diffusion transformers coming up, it's becoming increasingly important to have some good quantization tools for them.
We present our findings from a series of experiments on quantizing different diffusion pipelines based on diffusion transformers.
We demonstrate excellent memory savings with a bit of sacrifice on inference latency which is expected to improve in the coming days.
Diffusers π€ Quanto β€οΈ
This was a juicy collaboration between @dacorvo and myself.
Check out the post to learn all about it
https://huggingface.co/blog/quanto-diffusers
We present our findings from a series of experiments on quantizing different diffusion pipelines based on diffusion transformers.
We demonstrate excellent memory savings with a bit of sacrifice on inference latency which is expected to improve in the coming days.
Diffusers π€ Quanto β€οΈ
This was a juicy collaboration between @dacorvo and myself.
Check out the post to learn all about it
https://huggingface.co/blog/quanto-diffusers
multimodalartΒ
posted
an
update
5 months ago
Post
19403
New feature π₯
Image models and LoRAs now have little previews π€
If you don't know where to start to find them, I invite you to browse cool LoRAs in the profile of some amazing fine-tuners: @artificialguybr , @alvdansen , @DoctorDiffusion , @e-n-v-y , @KappaNeuro @ostris
Image models and LoRAs now have little previews π€
If you don't know where to start to find them, I invite you to browse cool LoRAs in the profile of some amazing fine-tuners: @artificialguybr , @alvdansen , @DoctorDiffusion , @e-n-v-y , @KappaNeuro @ostris
Post
2206
Were you aware that we have a dedicated guide on different prompting mechanisms to improve the image generation quality? π§¨
Takes you through simple prompt engineering, prompt weighting, prompt enhancement using GPT-2, and more.
Check out the guide here π¦―
https://huggingface.co/docs/diffusers/main/en/using-diffusers/weighted_prompts
Takes you through simple prompt engineering, prompt weighting, prompt enhancement using GPT-2, and more.
Check out the guide here π¦―
https://huggingface.co/docs/diffusers/main/en/using-diffusers/weighted_prompts
Post
3130
What is your favorite part of our Diffusers integration of Stable Diffusion 3?
My personal favorite is the ability to run it on a variety of different GPUs with minimal code changes.
Learn more about them here:
https://huggingface.co/blog/sd3
My personal favorite is the ability to run it on a variety of different GPUs with minimal code changes.
Learn more about them here:
https://huggingface.co/blog/sd3
sayakpaulΒ
authored
a
paper
6 months ago
Post
1866
𧨠Diffusers 0.28.0 is out π₯
It features the first non-generative pipeline of the library -- Marigold π₯
Marigold shines at performing Depth Estimation and Surface Normal Estimation. It was contributed by @toshas , one of the authors of Marigold.
This release also features a massive refactor (led by @DN6 ) of the
Check out the release notes here:
https://github.com/huggingface/diffusers/releases/tag/v0.28.0
It features the first non-generative pipeline of the library -- Marigold π₯
Marigold shines at performing Depth Estimation and Surface Normal Estimation. It was contributed by @toshas , one of the authors of Marigold.
This release also features a massive refactor (led by @DN6 ) of the
from_single_file()
method, highlighting our efforts for making our library more amenable to community features π€Check out the release notes here:
https://github.com/huggingface/diffusers/releases/tag/v0.28.0
multimodalartΒ
posted
an
update
7 months ago
Post
24784
The first open Stable Diffusion 3-like architecture model is JUST out π£ - but it is not SD3! π€
It is Tencent-Hunyuan/HunyuanDiT by Tencent, a 1.5B parameter DiT (diffusion transformer) text-to-image model πΌοΈβ¨, trained with multi-lingual CLIP + multi-lingual T5 text-encoders for english π€ chinese understanding
Try it out by yourself here βΆοΈ https://huggingface.co/spaces/multimodalart/HunyuanDiT
(a bit too slow as the model is chunky and the research code isn't super optimized for inference speed yet)
In the paper they claim to be SOTA open source based on human preference evaluation!
It is Tencent-Hunyuan/HunyuanDiT by Tencent, a 1.5B parameter DiT (diffusion transformer) text-to-image model πΌοΈβ¨, trained with multi-lingual CLIP + multi-lingual T5 text-encoders for english π€ chinese understanding
Try it out by yourself here βΆοΈ https://huggingface.co/spaces/multimodalart/HunyuanDiT
(a bit too slow as the model is chunky and the research code isn't super optimized for inference speed yet)
In the paper they claim to be SOTA open source based on human preference evaluation!
Post
2022
Custom pipelines and components in Diffusers πΈ
Wanted to use customized pipelines and other components (schedulers, unets, text encoders, etc.) in Diffusers?
Found it inflexible?
Since the first dawn on earth, we have supported loading custom pipelines via a
These pipelines are inference-only, i.e., the assumption is that we're leveraging an existing checkpoint (e.g., runwayml/stable-diffusion-v1-5) and ONLY modifying the pipeline implementation.
We have many cool pipelines, implemented that way. They all share the same benefits available to a
Check them here:
https://github.com/huggingface/diffusers/tree/main/examples/community
Then we might have a requirement of everything customized i.e., custom components along with a custom pipeline. Sure, that's all possible.
All you have to do is keep the implementations of those custom components on the Hub repository you're loading your pipeline checkpoint from.
SDXL Japanese was implemented like this π₯
stabilityai/japanese-stable-diffusion-xl
Full guide is available here β¬οΈ
https://huggingface.co/docs/diffusers/main/en/using-diffusers/custom_pipeline_overview
And, of course, these share all the benefits that come with
Wanted to use customized pipelines and other components (schedulers, unets, text encoders, etc.) in Diffusers?
Found it inflexible?
Since the first dawn on earth, we have supported loading custom pipelines via a
custom_pipeline
argument πThese pipelines are inference-only, i.e., the assumption is that we're leveraging an existing checkpoint (e.g., runwayml/stable-diffusion-v1-5) and ONLY modifying the pipeline implementation.
We have many cool pipelines, implemented that way. They all share the same benefits available to a
DiffusionPipeline
, no compromise there π€Check them here:
https://github.com/huggingface/diffusers/tree/main/examples/community
Then we might have a requirement of everything customized i.e., custom components along with a custom pipeline. Sure, that's all possible.
All you have to do is keep the implementations of those custom components on the Hub repository you're loading your pipeline checkpoint from.
SDXL Japanese was implemented like this π₯
stabilityai/japanese-stable-diffusion-xl
Full guide is available here β¬οΈ
https://huggingface.co/docs/diffusers/main/en/using-diffusers/custom_pipeline_overview
And, of course, these share all the benefits that come with
DiffusionPipeline
.
Post
2901
We're introducing experimental support for
If you have multiple GPUs you want to use to distribute the pipeline models, you can do so. Additionally, this becomes more useful when you have multiple low-VRAM GPUs.
Documentation:
https://huggingface.co/docs/diffusers/main/en/training/distributed_inference#device-placement
π¨ Currently, only "balanced" device mapping strategy is supported.
device_map
in Diffusers π€If you have multiple GPUs you want to use to distribute the pipeline models, you can do so. Additionally, this becomes more useful when you have multiple low-VRAM GPUs.
Documentation:
https://huggingface.co/docs/diffusers/main/en/training/distributed_inference#device-placement
π¨ Currently, only "balanced" device mapping strategy is supported.