GEVO is a multimodal large language model specialized for Ancient Chinese Character Evolution Analysis which has been accepted by ACL main 2026. The model is obtained by glyph-driven supervised fine-tuning of Qwen3-VL-2B-Instruct and is designed to enhance the understanding of ancient Chinese scripts, including oracle bone inscriptions, bronze inscriptions, seal scripts, clerical scripts, and regular scripts.

To facilitate further research, we have open-sourced the instruction-tuning dataset in Github used for training GEVO. By following the LlamaFactory tutorials, you can easily train a model by yourself using our data.

GEVO is trained only on traced reproductions of ancient Chinese characters. As a result, its performance on high-noise rubbings may be suboptimal.

Requirements

The model has been tested with the following environment:

accelerate==1.13.0
huggingface_hub==1.16.1
qwen-vl-utils==0.0.14
torch==2.5.1
torchaudio==2.5.1
torchcodec==0.13.0
torchvision==0.20.1
transformers==5.9.0

Citation

If you find this model useful, please cite:

@article{song2026gevo,
  title={Enhancing Multimodal Large Language Models for Ancient Chinese Character Evolution Analysis via Glyph-Driven Fine-Tuning},
  author={Song, Rui and Shi, Lida and Qi, Ruihua and Li, Yingji and Xu, Hao},
  journal={arXiv preprint arXiv:2604.11299},
  year={2026}
}