SPRIGHT

community

https://spright-t2i.github.io

SPRIGHT-T2I

Activity Feed

AI & ML interests

Diffusion models

Recent Activity

estellea authored a paper about 2 months ago

Getting it Right: Improving Spatial Consistency in Text-to-Image Models

estellea authored a paper about 2 months ago

LVLM-Intrepret: An Interpretability Tool for Large Vision-Language Models

estellea authored a paper about 2 months ago

FastRM: An efficient and automatic explainability framework for multimodal generative models

View all activity

SPRIGHT-T2I's activity

sayakpaul

posted an update 7 days ago

Post

1755

We have been cooking a couple of fine-tuning runs on CogVideoX with finetrainers, smol datasets, and LoRA to generate cool video effects like crushing, dissolving, etc.

We are also releasing a LoRA extraction utility from a fully fine-tuned checkpoint. I know that kind of stuff has existed since eternity, but the quality on video models was nothing short of spectacular. Below are some links:

* Models and datasets: https://huggingface.co/finetrainers
* finetrainers: https://github.com/a-r-r-o-w/finetrainers
* LoRA extraction: https://github.com/huggingface/diffusers/blob/main/scripts/extract_lora_from_model.py

1 reply

sayakpaul

posted an update 10 days ago

Post

1898

We have authored a post to go over the state of video generation in the Diffusers ecosystem 🧨

We cover the models supported, the knobs of optims our users can fire, fine-tuning, and more 🔥

5-6GBs for HunyuanVideo, sky is the limit 🌌 🤗
https://huggingface.co/blog/video_gen

sayakpaul

posted an update about 1 month ago

Post

4327

Commits speak louder than words 🤪

* 4 new video models
* Multiple image models, including SANA & Flux Control
* New quantizers -> GGUF & TorchAO
* New training scripts

Enjoy this holiday-special Diffusers release 🤗
Notes: https://github.com/huggingface/diffusers/releases/tag/v0.32.0

estellea

authored 3 papers about 2 months ago

Getting it Right: Improving Spatial Consistency in Text-to-Image Models

Paper • 2404.01197 • Published Apr 1, 2024 • 31

LVLM-Intrepret: An Interpretability Tool for Large Vision-Language Models

Paper • 2404.03118 • Published Apr 3, 2024 • 24

FastRM: An efficient and automatic explainability framework for multimodal generative models

Paper • 2412.01487 • Published Dec 2, 2024 • 1

sayakpaul

posted an update about 2 months ago

Post

2151

In the past seven days, the Diffusers team has shipped:

1. Two new video models
2. One new image model
3. Two new quantization backends
4. Three new fine-tuning scripts
5. Multiple fixes and library QoL improvements

Coffee on me if someone can guess 1 - 4 correctly.

1 reply

sayakpaul

posted an update about 2 months ago

Post

2116

Introducing a high-quality open-preference dataset to further this line of research for image generation.

Despite being such an inseparable component for modern image generation, open preference datasets are a rarity!

So, we decided to work on one with the community!

Check it out here:
https://huggingface.co/blog/image-preferences

7 replies

sayakpaul

posted an update about 2 months ago

Post

2151

The Control family of Flux from @black-forest-labs should be discussed more!

It enables structural controls like ControlNets while being significantly less expensive to run!

So, we're working on a Control LoRA training script 🤗

It's still WIP, so go easy:
https://github.com/huggingface/diffusers/pull/10130

sayakpaul

authored a paper 2 months ago

A Noise is Worth Diffusion Guidance

Paper • 2412.03895 • Published Dec 5, 2024 • 28

sayakpaul

posted an update 2 months ago

Post

1510

Let 2024 be the year of video model fine-tunes!

Check it out here:
https://github.com/a-r-r-o-w/cogvideox-factory/tree/main/training/mochi-1

sayakpaul

posted an update 3 months ago

Post

2669

It's been a while we shipped native quantization support in diffusers 🧨

We currently support bistandbytes as the official backend but using others like torchao is already very simple.

This post is just a reminder of what's possible:

1. Loading a model with a quantization config
2. Saving a model with quantization config
3. Loading a pre-quantized model
4. enable_model_cpu_offload()
5. Training and loading LoRAs into quantized checkpoints

Docs:
https://huggingface.co/docs/diffusers/main/en/quantization/bitsandbytes

1 reply

sayakpaul

updated a dataset 4 months ago

SPRIGHT-T2I/spright

Updated Oct 9, 2024 • 9.1k • 30

sayakpaul

posted an update 4 months ago

Post

2772

Did some little experimentation to resize pre-trained LoRAs on Flux. I explored two themes:

* Decrease the rank of a LoRA
* Increase the rank of a LoRA

The first one is helpful in reducing memory requirements if the LoRA is of a high rank, while the second one is merely an experiment. Another implication of this study is in the unification of LoRA ranks when you would like to torch.compile() them.

Check it out here:
sayakpaul/flux-lora-resizing

1 reply

sayakpaul

authored a paper 5 months ago

LlamaDuo: LLMOps Pipeline for Seamless Migration from Service LLMs to Small-Scale Local LLMs

Paper • 2408.13467 • Published Aug 24, 2024 • 25

sayakpaul

posted an update 6 months ago

Post

2960

Here is a hackable and minimal implementation showing how to perform distributed text-to-image generation with Diffusers and Accelerate.

Full snippet is here: https://gist.github.com/sayakpaul/cfaebd221820d7b43fae638b4dfa01ba

With @JW17

agneet

authored 2 papers 6 months ago

REVISION: Rendering Tools Enable Spatial Fidelity in Vision-Language Models

Paper • 2408.02231 • Published Aug 5, 2024 • 2

On the Robustness of Language Guidance for Low-Level Vision Tasks: Findings from Depth Estimation

Paper • 2404.08540 • Published Apr 12, 2024 • 11

sayakpaul

posted an update 6 months ago

Post

4505

Flux.1-Dev like images but in fewer steps.

Merging code (very simple), inference code, merged params: sayakpaul/FLUX.1-merged

Enjoy the Monday 🤗

4 replies

sayakpaul

posted an update 6 months ago

Post

3807

With larger and larger diffusion transformers coming up, it's becoming increasingly important to have some good quantization tools for them.

We present our findings from a series of experiments on quantizing different diffusion pipelines based on diffusion transformers.

We demonstrate excellent memory savings with a bit of sacrifice on inference latency which is expected to improve in the coming days.

Diffusers 🤝 Quanto ❤️

This was a juicy collaboration between @dacorvo and myself.

Check out the post to learn all about it
https://huggingface.co/blog/quanto-diffusers

3 replies

AI & ML interests

Recent Activity

Team members 5

SPRIGHT-T2I's activity