README.md · neulab/UIX-Qwen2-Mind2Web at c945f778a5dfb9009e8528afb31a6d8976181ccc

metadata

license: odc-by

Model for the paper: Harnessing Webpage Uis For Text Rich Visual Understanding

🌐 Homepage | 🐍 GitHub | 📖 arXiv

Introduction

We introduce MultiUI, a dataset containing 7.3 million samples from 1 million websites, covering diverse multi- modal tasks and UI layouts. Models trained on MultiUI not only excel in web UI tasks—achieving up to a 48% improvement on VisualWebBench and a 19.1% boost in action accuracy on a web agent dataset Mind2Web—but also generalize surprisingly well to non-web UI tasks and even to non-UI domains, such as document understanding, OCR, and chart interpretation.

Model Performance

Contact

Junpeng Liu: jpliu@link.cuhk.edu.hk
Xiang Yue: xyue2@andrew.cmu.edu

Citation

If you find this work helpful, please cite out paper: