Instructions to use Xingyu-Zheng/MrFlow with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Diffusers
How to use Xingyu-Zheng/MrFlow with Diffusers:
pip install -U diffusers transformers accelerate
import torch from diffusers import DiffusionPipeline # switch to "mps" for apple devices pipe = DiffusionPipeline.from_pretrained("Xingyu-Zheng/MrFlow", dtype=torch.bfloat16, device_map="cuda") prompt = "Astronaut in a jungle, cold color palette, muted colors, detailed, 8k" image = pipe(prompt).images[0] - Notebooks
- Google Colab
- Kaggle
- Local Apps Settings
- Draw Things
- DiffusionBee
Multi-Resolution Flow Matching: Training-Free Diffusion Acceleration via Staged Sampling
This repository provides the implementation of MrFlow, a training-free staged sampling method for accelerating pretrained flow-matching text-to-image diffusion models.
MrFlow first samples a low-resolution image, upsamples the decoded result in pixel space with Real-ESRGAN, re-encodes the upsampled image, injects scheduler-consistent low-strength noise, and performs a short high-resolution refinement. The pipeline shifts most denoising cost from expensive high-resolution sampling to cheaper low-resolution sampling while preserving local detail quality.
β¨ Highlights
- Training-free deployment. No finetuning, learned upsampler, or model-specific retraining is required.
- No custom kernels. The implementation uses standard PyTorch, Diffusers pipelines, and scheduler controls.
- Strong aggressive-speed regime. MrFlow reaches more than
10xend-to-end speedup on Qwen-Image while preserving visual quality. - Works with distilled models. The same pipeline can be combined with pretrained timestep-distilled models such as Pi-Flow and FLUX-schnell.
- Compact staged design. The implementation transfers across Qwen-Image, FLUX.1-dev, FLUX.2 Klein, and Z-Image families.
π’ News
- [2026/07] π° MrFlow is featured on Hugging Face Daily Papers.
- [2026/07] β‘ We release the MrFlow ComfyUI plugin.
- [2026/07] π₯ The MrFlow paper is available on arXiv, and the source code is released.
π οΈ Installation
Create a Diffusers-compatible environment for the target backbone. The demos use:
- PyTorch
- Diffusers
- Transformers
- Real-ESRGAN
MrFlow uses Real-ESRGAN for x2 pixel-space super-resolution. Install Real-ESRGAN from the official project and download the x2 weights:
https://github.com/xinntao/Real-ESRGAN
The scripts contain placeholder checkpoint paths. Replace them with local paths to the pretrained text-to-image model and Real-ESRGAN x2 weights before running.
π Quick Start
The repository root keeps only two minimal reference scripts plus the shared scheduler helper:
| Script | Model | Setting | Output |
|---|---|---|---|
qwen_image_mrflow.py |
Qwen-Image | MrFlow 12plus1 |
outputs/qwen_image_mrflow_12plus1/ |
flux1_mrflow.py |
FLUX.1-dev | MrFlow 12plus1 |
outputs/flux1_mrflow_12plus1/ |
Edit the checkpoint paths at the top of each script:
MODEL = "/path/to/Qwen-Image"
REALESRGAN_X2 = "/path/to/RealESRGAN_x2.pth"
Run:
python qwen_image_mrflow.py
python flux1_mrflow.py
Each script saves:
stage1_low.png: low-resolution generated image.stage2_upscaled.png: Real-ESRGAN x2 upsampled image.stage3_refined.png: final high-resolution refined image.
βοΈ Core Settings
| Setting | Low-resolution steps | Refinement steps | Direct sigma | Typical use |
|---|---|---|---|---|
12plus1 |
12 | 1 | 0.12 |
Aggressive acceleration. |
20plus1 |
20 | 1 | 0.12 |
Higher-quality operating point. |
The high-resolution refinement uses an explicit direct-sigma schedule. For example, 12plus1 denotes 12 low-resolution denoising steps followed by one high-resolution step from sigma=0.12 to 0.
π¦ Supported Demos
Parameterized variants and additional model-family demos are available in examples/.
| Script | Backbone | Notes |
|---|---|---|
examples/flux1_mrflow.py |
FLUX.1-dev | Training-free MrFlow. |
examples/flux1_piflow_mrflow.py |
FLUX.1-dev + Pi-Flow | Combines MrFlow with distilled weights. |
examples/qwen_image_mrflow.py |
Qwen-Image | Training-free MrFlow. |
examples/qwen_image_piflow_mrflow.py |
Qwen-Image + Pi-Flow | Combines MrFlow with distilled weights. |
examples/flux2_mrflow.py |
FLUX.2 Klein | Base and non-base variants. |
examples/zimage_turbo_mrflow.py |
Z-Image-Turbo | Reduced-step model plus MrFlow refinement. |
Run all configured examples with:
bash examples/run_examples.sh
See examples/README.md for command-line usage, FLUX.2 Klein presets, Z-Image-Turbo refinement defaults, and output filename conventions.
Pi-Flow examples are optional and require a separate local checkout of LakonLab. Set LAKONLAB_ROOT to that checkout before running the Pi-Flow scripts.
π§© ComfyUI Plugin
The repository also includes ComfyUI-MrFlow/, a ComfyUI custom-node extension for Qwen-oriented MrFlow workflows. It provides helper nodes, editable workflow and API JSON examples, a reusable subgraph, and a model-link helper for split Qwen-Image bundles.
To use it, place or symlink ComfyUI-MrFlow/ into ComfyUI/custom_nodes/, restart ComfyUI, and open ComfyUI-MrFlow/examples/qwen_mrflow_workflow.json or load ComfyUI-MrFlow/subgraphs/qwen_mrflow.json.
πΌοΈ Results
Qwen-Image generation examples. With 12 low-resolution steps and one high-resolution refinement step, MrFlow produces diverse 1024-resolution samples on Qwen-Image while reaching above 10x end-to-end speedup.
Accuracy-efficiency trade-off. On FLUX.1-dev and Qwen-Image, MrFlow offers a flexible trade-off between generation quality and measured end-to-end speedup, and remains effective where other training-free strategies degrade sharply.
Runtime breakdown. For Qwen-Image 12plus1, measured end-to-end latency is 4.77s versus 49.32s for native 50-step inference. The main cost is shifted from high-resolution sampling to cheaper low-resolution sampling, while SR and VAE overhead remain small.
π Representative Numbers
| Backbone | Setting | End-to-end speedup |
|---|---|---|
| FLUX.1-dev | 12 + 1 |
8.25x |
| Qwen-Image | 12 + 1 |
10.3x |
| FLUX.2 Klein Base 9B | 12 + 1 |
8.79x |
| Z-Image-Turbo | 8 + 1 |
21.0x |
| Qwen-Image + Pi-Flow | 4 + 1 |
up to 25x |
Speedups are measured end to end, including text encoding, VAE encode/decode, super-resolution, noise preparation, and diffusion forward passes.
πΊοΈ Roadmap
- Project README, framework figure, visual results, trade-off plot, and runtime breakdown.
- Implementation code.
- Public paper link.
- ComfyUI extension plugin.
- Demo video.
π Citation
If you find MrFlow useful, please cite our paper:
@misc{zheng2026multiresolutionflowmatchingtrainingfree,
title={Multi-Resolution Flow Matching: Training-Free Diffusion Acceleration via Staged Sampling},
author={Xingyu Zheng and Xianglong Liu and Yifu Ding and Weilun Feng and Junqing Lin and Jinyang Guo and Haotong Qin},
year={2026},
eprint={2607.01642},
archivePrefix={arXiv},
primaryClass={cs.CV},
url={https://arxiv.org/abs/2607.01642},
}
π Acknowledgements
This implementation builds on the Diffusers ecosystem and uses Real-ESRGAN for pixel-space super-resolution.
- Downloads last month
- -
Model tree for Xingyu-Zheng/MrFlow
Base model
Qwen/Qwen-Image