Diffusers Demo at ICCV 2023
non-profit
AI & ML interests
Diffusion models
Recent Activity
View all activity
iccv23-diffusers-demo's activity
Post
2017
Introducing a high-quality open-preference dataset to further this line of research for image generation.
Despite being such an inseparable component for modern image generation, open preference datasets are a rarity!
So, we decided to work on one with the community!
Check it out here:
https://huggingface.co/blog/image-preferences
Despite being such an inseparable component for modern image generation, open preference datasets are a rarity!
So, we decided to work on one with the community!
Check it out here:
https://huggingface.co/blog/image-preferences
Post
2087
The Control family of Flux from
@black-forest-labs
should be discussed more!
It enables structural controls like ControlNets while being significantly less expensive to run!
So, we're working on a Control LoRA training script π€
It's still WIP, so go easy:
https://github.com/huggingface/diffusers/pull/10130
It enables structural controls like ControlNets while being significantly less expensive to run!
So, we're working on a Control LoRA training script π€
It's still WIP, so go easy:
https://github.com/huggingface/diffusers/pull/10130
sayakpaulΒ
authored
a
paper
16 days ago
Post
1465
Let 2024 be the year of video model fine-tunes!
Check it out here:
https://github.com/a-r-r-o-w/cogvideox-factory/tree/main/training/mochi-1
Check it out here:
https://github.com/a-r-r-o-w/cogvideox-factory/tree/main/training/mochi-1
Post
2591
It's been a while we shipped native quantization support in
We currently support
This post is just a reminder of what's possible:
1. Loading a model with a quantization config
2. Saving a model with quantization config
3. Loading a pre-quantized model
4.
5. Training and loading LoRAs into quantized checkpoints
Docs:
https://huggingface.co/docs/diffusers/main/en/quantization/bitsandbytes
diffusers
π§¨We currently support
bistandbytes
as the official backend but using others like torchao
is already very simple. This post is just a reminder of what's possible:
1. Loading a model with a quantization config
2. Saving a model with quantization config
3. Loading a pre-quantized model
4.
enable_model_cpu_offload()
5. Training and loading LoRAs into quantized checkpoints
Docs:
https://huggingface.co/docs/diffusers/main/en/quantization/bitsandbytes
Post
2752
Did some little experimentation to resize pre-trained LoRAs on Flux. I explored two themes:
* Decrease the rank of a LoRA
* Increase the rank of a LoRA
The first one is helpful in reducing memory requirements if the LoRA is of a high rank, while the second one is merely an experiment. Another implication of this study is in the unification of LoRA ranks when you would like to
Check it out here:
sayakpaul/flux-lora-resizing
* Decrease the rank of a LoRA
* Increase the rank of a LoRA
The first one is helpful in reducing memory requirements if the LoRA is of a high rank, while the second one is merely an experiment. Another implication of this study is in the unification of LoRA ranks when you would like to
torch.compile()
them. Check it out here:
sayakpaul/flux-lora-resizing
sayakpaulΒ
authored
a
paper
4 months ago
Post
2945
Here is a hackable and minimal implementation showing how to perform distributed text-to-image generation with Diffusers and Accelerate.
Full snippet is here: https://gist.github.com/sayakpaul/cfaebd221820d7b43fae638b4dfa01ba
With @JW17
Full snippet is here: https://gist.github.com/sayakpaul/cfaebd221820d7b43fae638b4dfa01ba
With @JW17
Post
4477
Flux.1-Dev like images but in fewer steps.
Merging code (very simple), inference code, merged params: sayakpaul/FLUX.1-merged
Enjoy the Monday π€
Merging code (very simple), inference code, merged params: sayakpaul/FLUX.1-merged
Enjoy the Monday π€
Post
3793
With larger and larger diffusion transformers coming up, it's becoming increasingly important to have some good quantization tools for them.
We present our findings from a series of experiments on quantizing different diffusion pipelines based on diffusion transformers.
We demonstrate excellent memory savings with a bit of sacrifice on inference latency which is expected to improve in the coming days.
Diffusers π€ Quanto β€οΈ
This was a juicy collaboration between @dacorvo and myself.
Check out the post to learn all about it
https://huggingface.co/blog/quanto-diffusers
We present our findings from a series of experiments on quantizing different diffusion pipelines based on diffusion transformers.
We demonstrate excellent memory savings with a bit of sacrifice on inference latency which is expected to improve in the coming days.
Diffusers π€ Quanto β€οΈ
This was a juicy collaboration between @dacorvo and myself.
Check out the post to learn all about it
https://huggingface.co/blog/quanto-diffusers
Post
2206
Were you aware that we have a dedicated guide on different prompting mechanisms to improve the image generation quality? π§¨
Takes you through simple prompt engineering, prompt weighting, prompt enhancement using GPT-2, and more.
Check out the guide here π¦―
https://huggingface.co/docs/diffusers/main/en/using-diffusers/weighted_prompts
Takes you through simple prompt engineering, prompt weighting, prompt enhancement using GPT-2, and more.
Check out the guide here π¦―
https://huggingface.co/docs/diffusers/main/en/using-diffusers/weighted_prompts
Post
3130
What is your favorite part of our Diffusers integration of Stable Diffusion 3?
My personal favorite is the ability to run it on a variety of different GPUs with minimal code changes.
Learn more about them here:
https://huggingface.co/blog/sd3
My personal favorite is the ability to run it on a variety of different GPUs with minimal code changes.
Learn more about them here:
https://huggingface.co/blog/sd3
sayakpaulΒ
authored
a
paper
6 months ago
Post
1866
𧨠Diffusers 0.28.0 is out π₯
It features the first non-generative pipeline of the library -- Marigold π₯
Marigold shines at performing Depth Estimation and Surface Normal Estimation. It was contributed by @toshas , one of the authors of Marigold.
This release also features a massive refactor (led by @DN6 ) of the
Check out the release notes here:
https://github.com/huggingface/diffusers/releases/tag/v0.28.0
It features the first non-generative pipeline of the library -- Marigold π₯
Marigold shines at performing Depth Estimation and Surface Normal Estimation. It was contributed by @toshas , one of the authors of Marigold.
This release also features a massive refactor (led by @DN6 ) of the
from_single_file()
method, highlighting our efforts for making our library more amenable to community features π€Check out the release notes here:
https://github.com/huggingface/diffusers/releases/tag/v0.28.0
Post
2022
Custom pipelines and components in Diffusers πΈ
Wanted to use customized pipelines and other components (schedulers, unets, text encoders, etc.) in Diffusers?
Found it inflexible?
Since the first dawn on earth, we have supported loading custom pipelines via a
These pipelines are inference-only, i.e., the assumption is that we're leveraging an existing checkpoint (e.g., runwayml/stable-diffusion-v1-5) and ONLY modifying the pipeline implementation.
We have many cool pipelines, implemented that way. They all share the same benefits available to a
Check them here:
https://github.com/huggingface/diffusers/tree/main/examples/community
Then we might have a requirement of everything customized i.e., custom components along with a custom pipeline. Sure, that's all possible.
All you have to do is keep the implementations of those custom components on the Hub repository you're loading your pipeline checkpoint from.
SDXL Japanese was implemented like this π₯
stabilityai/japanese-stable-diffusion-xl
Full guide is available here β¬οΈ
https://huggingface.co/docs/diffusers/main/en/using-diffusers/custom_pipeline_overview
And, of course, these share all the benefits that come with
Wanted to use customized pipelines and other components (schedulers, unets, text encoders, etc.) in Diffusers?
Found it inflexible?
Since the first dawn on earth, we have supported loading custom pipelines via a
custom_pipeline
argument πThese pipelines are inference-only, i.e., the assumption is that we're leveraging an existing checkpoint (e.g., runwayml/stable-diffusion-v1-5) and ONLY modifying the pipeline implementation.
We have many cool pipelines, implemented that way. They all share the same benefits available to a
DiffusionPipeline
, no compromise there π€Check them here:
https://github.com/huggingface/diffusers/tree/main/examples/community
Then we might have a requirement of everything customized i.e., custom components along with a custom pipeline. Sure, that's all possible.
All you have to do is keep the implementations of those custom components on the Hub repository you're loading your pipeline checkpoint from.
SDXL Japanese was implemented like this π₯
stabilityai/japanese-stable-diffusion-xl
Full guide is available here β¬οΈ
https://huggingface.co/docs/diffusers/main/en/using-diffusers/custom_pipeline_overview
And, of course, these share all the benefits that come with
DiffusionPipeline
.
Post
4627
OpenELM in Core ML
Apple recently released a set of efficient LLMs in sizes varying between 270M and 3B parameters. Their quality, according to benchmarks, is similar to OLMo models of comparable size, but they required half the pre-training tokens because they use layer-wise scaling, where the number of attention heads increases in deeper layers.
I converted these models to Core ML, for use on Apple Silicon, using this script: https://gist.github.com/pcuenca/23cd08443460bc90854e2a6f0f575084. The converted models were uploaded to this community in the Hub for anyone that wants to integrate inside their apps: corenet-community/openelm-core-ml-6630c6b19268a5d878cfd194
The conversion was done with the following parameters:
- Precision: float32.
- Sequence length: fixed to 128.
With swift-transformers (https://github.com/huggingface/swift-transformers), I'm getting about 56 tok/s with the 270M on my M1 Max, and 6.5 with the largest 3B model. These speeds could be improved by converting to
I'm also looking at optimizing inference using an experimental kv cache in swift-transformers. It's a bit tricky because the layers have varying number of attention heads, but I'm curious to see how much this feature can accelerate performance in this model family :)
Regarding the instruct fine-tuned models, I don't know the chat template that was used. The models use the Llama 2 tokenizer, but the Llama 2 chat template, or the default Alignment Handbook one that was used to train, are not recognized. Any ideas on this welcome!
Apple recently released a set of efficient LLMs in sizes varying between 270M and 3B parameters. Their quality, according to benchmarks, is similar to OLMo models of comparable size, but they required half the pre-training tokens because they use layer-wise scaling, where the number of attention heads increases in deeper layers.
I converted these models to Core ML, for use on Apple Silicon, using this script: https://gist.github.com/pcuenca/23cd08443460bc90854e2a6f0f575084. The converted models were uploaded to this community in the Hub for anyone that wants to integrate inside their apps: corenet-community/openelm-core-ml-6630c6b19268a5d878cfd194
The conversion was done with the following parameters:
- Precision: float32.
- Sequence length: fixed to 128.
With swift-transformers (https://github.com/huggingface/swift-transformers), I'm getting about 56 tok/s with the 270M on my M1 Max, and 6.5 with the largest 3B model. These speeds could be improved by converting to
float16
. However, there's some precision loss somewhere and generation doesn't work in float16
mode yet. I'm looking into this and will keep you posted! Or take a look at this issue if you'd like to help: https://github.com/huggingface/swift-transformers/issues/95I'm also looking at optimizing inference using an experimental kv cache in swift-transformers. It's a bit tricky because the layers have varying number of attention heads, but I'm curious to see how much this feature can accelerate performance in this model family :)
Regarding the instruct fine-tuned models, I don't know the chat template that was used. The models use the Llama 2 tokenizer, but the Llama 2 chat template, or the default Alignment Handbook one that was used to train, are not recognized. Any ideas on this welcome!
Post
2901
We're introducing experimental support for
If you have multiple GPUs you want to use to distribute the pipeline models, you can do so. Additionally, this becomes more useful when you have multiple low-VRAM GPUs.
Documentation:
https://huggingface.co/docs/diffusers/main/en/training/distributed_inference#device-placement
π¨ Currently, only "balanced" device mapping strategy is supported.
device_map
in Diffusers π€If you have multiple GPUs you want to use to distribute the pipeline models, you can do so. Additionally, this becomes more useful when you have multiple low-VRAM GPUs.
Documentation:
https://huggingface.co/docs/diffusers/main/en/training/distributed_inference#device-placement
π¨ Currently, only "balanced" device mapping strategy is supported.
sayakpaulΒ
authored
a
paper
9 months ago
Post
2386
Worked on a short blog post discussing how we semi-automated the release process of the
* Publishing the package on Test PyPI and main PyPI servers.
* Notifying an internal Slack channel after a release is published on the repository.
Check it out here π
https://sayak.dev/posts/streamlined-releases.html
diffusers
library. The post delves deeper into the workflows responsible for:* Publishing the package on Test PyPI and main PyPI servers.
* Notifying an internal Slack channel after a release is published on the repository.
Check it out here π
https://sayak.dev/posts/streamlined-releases.html