Edit model card

Glyph-ByT5-v2: A Strong Aesthetic Baseline for Accurate Multilingual Visual Text Rendering

We introduce Glyph-ByT5-v2, a customized text encoder for accurate multilingual visual text rendering and improved aesthetics. As an extension of Glyph-SDXL, our multilingual version supports visual text rendering for up to 10 different languages: English, Chinese, Japanese, Korean, French, German, Spanish, Italian, Portuguese and Russian. Combined with SDXL, our proposed Glyph-SDXL-v2 achieves accurate multilingual design image visual text rendering.

Glyph-ByT5-v2: A Strong Aesthetic Baseline for Accurate Multilingual Visual Text Rendering
Zeyu Liu, Weicong Liang, Yiming Zhao, Bohan Chen, Ji Li, Yuhui Yuan
Microsoft Research Asia; Tsinghua University; Peking University; University of Liverpool
Preprint

Model Sources

Model Description

Please check our paper and project page for more details. Detail usage and inference code can be found here.

Visualization

example 1 example 2 example 3 example 4

Quick Usage

python inference_v2.py configs/glyph_sdxl_v2_albedo.py checkpoints examples/xiaoman.json --out_folder work_dirs/xiaoman --device cuda --sampler dpm

More Configurations

We list some more useful configurations for easy usage:

Argument/Config Place Default Description
cfg argument 5.0 Classifier-free guidance
sampler argument dpm Sampler, provide support for dpm (DPM++ 2M Karras) and euler (EulerDiscreteScheduler)
pretrained_model_name_or_path config stablediffusionapi/albedobase-xl-20 Base model
seed annotation None Seed for inference

Citation

If you find our work useful in your research, please consider citing:

@misc{liu2024glyphbyt5v2,
      title={Glyph-ByT5-v2: A Strong Aesthetic Baseline for Accurate Multilingual Visual Text Rendering}, 
      author={Zeyu Liu and Weicong Liang and Yiming Zhao and Bohan Chen and Ji Li and Yuhui Yuan},
      year={2024},
      eprint={2406.10208},
      archivePrefix={arXiv},
      primaryClass={cs.CV}
}

and

@misc{liu2024glyphbyt5,
      title={Glyph-ByT5: A Customized Text Encoder for Accurate Visual Text Rendering}, 
      author={Zeyu Liu and Weicong Liang and Zhanhao Liang and Chong Luo and Ji Li and Gao Huang and Yuhui Yuan},
      year={2024},
      eprint={2403.09622},
      archivePrefix={arXiv},
      primaryClass={cs.CV}
}
Downloads last month

-

Downloads are not tracked for this model. How to track
Unable to determine this model’s pipeline type. Check the docs .

Space using GlyphByT5/Glyph-SDXL-v2 1