Edit model card


Multimodal (text + layout/format + image) pre-training for document AI

LayoutXLM is a multilingual variant of LayoutLMv2.

The documentation of this model in the Transformers library can be found here.

Microsoft Document AI | GitHub


LayoutXLM is a multimodal pre-trained model for multilingual document understanding, which aims to bridge the language barriers for visually-rich document understanding. Experiment results show that it has significantly outperformed the existing SOTA cross-lingual pre-trained models on the XFUND dataset.

LayoutXLM: Multimodal Pre-training for Multilingual Visually-rich Document Understanding

Yiheng Xu, Tengchao Lv, Lei Cui, Guoxin Wang, Yijuan Lu, Dinei Florencio, Cha Zhang, Furu Wei, arXiv Preprint 2021

Downloads last month
Hosted inference API

Unable to determine this model’s pipeline type. Check the docs .