Instructions to use MingxuChai/PA-BDM with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use MingxuChai/PA-BDM with Transformers:
# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("image-text-to-text", model="MingxuChai/PA-BDM", trust_remote_code=True) messages = [ { "role": "user", "content": [ {"type": "image", "url": "https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/p-blog/candy.JPG"}, {"type": "text", "text": "What animal is on the candy?"} ] }, ] pipe(text=messages)# Load model directly from transformers import AutoModelForCausalLM model = AutoModelForCausalLM.from_pretrained("MingxuChai/PA-BDM", trust_remote_code=True, dtype="auto") - Notebooks
- Google Colab
- Kaggle
- Local Apps
- vLLM
How to use MingxuChai/PA-BDM with vLLM:
Install from pip and serve model
# Install vLLM from pip: pip install vllm # Start the vLLM server: vllm serve "MingxuChai/PA-BDM" # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:8000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "MingxuChai/PA-BDM", "messages": [ { "role": "user", "content": [ { "type": "text", "text": "Describe this image in one sentence." }, { "type": "image_url", "image_url": { "url": "https://cdn.britannica.com/61/93061-050-99147DCE/Statue-of-Liberty-Island-New-York-Bay.jpg" } } ] } ] }'Use Docker
docker model run hf.co/MingxuChai/PA-BDM
- SGLang
How to use MingxuChai/PA-BDM with SGLang:
Install from pip and serve model
# Install SGLang from pip: pip install sglang # Start the SGLang server: python3 -m sglang.launch_server \ --model-path "MingxuChai/PA-BDM" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "MingxuChai/PA-BDM", "messages": [ { "role": "user", "content": [ { "type": "text", "text": "Describe this image in one sentence." }, { "type": "image_url", "image_url": { "url": "https://cdn.britannica.com/61/93061-050-99147DCE/Statue-of-Liberty-Island-New-York-Bay.jpg" } } ] } ] }'Use Docker images
docker run --gpus all \ --shm-size 32g \ -p 30000:30000 \ -v ~/.cache/huggingface:/root/.cache/huggingface \ --env "HF_TOKEN=<secret>" \ --ipc=host \ lmsysorg/sglang:latest \ python3 -m sglang.launch_server \ --model-path "MingxuChai/PA-BDM" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "MingxuChai/PA-BDM", "messages": [ { "role": "user", "content": [ { "type": "text", "text": "Describe this image in one sentence." }, { "type": "image_url", "image_url": { "url": "https://cdn.britannica.com/61/93061-050-99147DCE/Statue-of-Liberty-Island-New-York-Bay.jpg" } } ] } ] }' - Docker Model Runner
How to use MingxuChai/PA-BDM with Docker Model Runner:
docker model run hf.co/MingxuChai/PA-BDM
PA-BDM: Prefix-Adaptive Block Diffusion for Efficient Document Recognition
Efficient Document Recognition with Prefix-Adaptive Block Diffusion
Mingxu Chai, Ziyu Shen, Chenyu Liu, Kaidi Zhang, Jiazheng Zhang, Dingwei Zhu, Zhiheng Xi, Ruoyu Chen, Jun Long, Jihua Kang, Tao Gui, Qi Zhang
π° News
- [2026.05] π We release PA-BDM, a prefix-adaptive block diffusion framework for efficient document recognition.
π Introduction
Document recognition aims to convert document images containing text, formulas, tables, and complex layouts into structured machine-readable formats. While autoregressive vision-language models have achieved strong recognition quality, their sequential decoding process can be inefficient for long structured outputs. Block diffusion models provide a promising alternative by enabling semi-parallel generation and KV-cache reuse, but existing block diffusion approaches often rely on a fixed block granularity, which limits decoding flexibility and may introduce instability for structure-sensitive recognition tasks.
PA-BDM addresses these limitations with a prefix-adaptive block diffusion framework. Instead of treating the block size as a fixed generation unit, PA-BDM uses it as a maximum candidate generation range and dynamically commits reliable prefixes during decoding. This design enables adaptive generation lengths, timely KV-cache reuse, and more stable recognition of structured document outputs.
β¨ Highlights
Prefix-Adaptive Decoding: Dynamically commits reliable prefixes within each candidate block, allowing the effective decoding length to adapt to local prediction confidence.
Efficient KV-cache Reuse: Enables timely cache updates without waiting for an entire fixed block to be fully resolved.
Structure-sensitive Document Recognition: Designed for document recognition tasks involving text, formulas, tables, and structured outputs.
Improved Efficiency-Accuracy Trade-off: Achieves faster inference while maintaining strong recognition performance across document recognition benchmarks.
π Usage
Please refer to the repository for installation and inference instructions:
- GitHub: https://github.com/SII-sc22mc/PA-BDM
- Model: https://huggingface.co/MingxuChai/PA-BDM
- Paper: https://arxiv.org/pdf/2605.16861
β€οΈ Acknowledgements
This project builds upon prior work and open-source resources including Qwen2.5-VL, DiffusionVL, BD3LMs, and related diffusion language modeling frameworks. We thank the authors for their valuable contributions to the community.
π Citation
If you find our work useful, please cite our paper:
@misc{chai2026prefixadaptiveblockdiffusionefficient,
title={Prefix-Adaptive Block Diffusion for Efficient Document Recognition},
author={Mingxu Chai and Ziyu Shen and Chenyu Liu and Kaidi Zhang and Jiazheng Zhang and Dingwei Zhu and Zhiheng Xi and Ruoyu Chen and Jun Long and Jihua Kang and Tao Gui and Qi Zhang},
year={2026},
eprint={2605.16861},
archivePrefix={arXiv},
primaryClass={cs.CV},
url={https://arxiv.org/abs/2605.16861},
}
- Downloads last month
- 48