Glyph-ByT5-v2: A Strong Aesthetic Baseline for Accurate Multilingual Visual Text Rendering

We introduce Glyph-ByT5-v2, a customized text encoder for accurate multilingual visual text rendering and improved aesthetics. As an extension of Glyph-SDXL, our multilingual version supports visual text rendering for up to 10 different languages: English, Chinese, Japanese, Korean, French, German, Spanish, Italian, Portuguese and Russian. Combined with SDXL, our proposed Glyph-SDXL-v2 achieves accurate multilingual design image visual text rendering.

Glyph-ByT5-v2: A Strong Aesthetic Baseline for Accurate Multilingual Visual Text Rendering
Zeyu Liu, Weicong Liang, Yiming Zhao, Bohan Chen, Ji Li, Yuhui Yuan
Microsoft Research Asia; Tsinghua University; Peking University; University of Liverpool
Preprint

Model Sources

Repository: [https://github.com/AIGText/Glyph-ByT5]
Paper: [https://arxiv.org/abs/2406.10208]
Project Page: [https://glyph-byt5-v2.github.io/]

Model Description

Please check our paper and project page for more details. Detail usage and inference code can be found here.

Visualization

Quick Usage

python inference_v2.py configs/glyph_sdxl_v2_albedo.py checkpoints examples/xiaoman.json --out_folder work_dirs/xiaoman --device cuda --sampler dpm

More Configurations

We list some more useful configurations for easy usage:

Argument/Config	Place	Default	Description
cfg	argument	5.0	Classifier-free guidance
sampler	argument	dpm	Sampler, provide support for dpm (DPM++ 2M Karras) and euler (EulerDiscreteScheduler)
pretrained_model_name_or_path	config	stablediffusionapi/albedobase-xl-20	Base model
seed	annotation	None	Seed for inference

Citation

If you find our work useful in your research, please consider citing:

@misc{liu2024glyphbyt5v2,
      title={Glyph-ByT5-v2: A Strong Aesthetic Baseline for Accurate Multilingual Visual Text Rendering}, 
      author={Zeyu Liu and Weicong Liang and Yiming Zhao and Bohan Chen and Ji Li and Yuhui Yuan},
      year={2024},
      eprint={2406.10208},
      archivePrefix={arXiv},
      primaryClass={cs.CV}
}

and

@misc{liu2024glyphbyt5,
      title={Glyph-ByT5: A Customized Text Encoder for Accurate Visual Text Rendering}, 
      author={Zeyu Liu and Weicong Liang and Zhanhao Liang and Chong Luo and Ji Li and Gao Huang and Yuhui Yuan},
      year={2024},
      eprint={2403.09622},
      archivePrefix={arXiv},
      primaryClass={cs.CV}
}

GlyphByT5
/

Glyph-SDXL-v2