neulab
/

UIX-Qwen2-Mind2Web

Model card Files Files and versions Community

UIX-Qwen2-Mind2Web / README.md

oottyy's picture

Update README.md

c945f77 verified 5 months ago

|

1.41 kB

	---
	license: odc-by
	---
	#### Model for the paper: [Harnessing Webpage Uis For Text Rich Visual Understanding]()

	🌐 [Homepage](https://neulab.github.io/MultiUI/) \| 🐍 [GitHub](https://github.com/neulab/multiui) \| 📖 [arXiv]()

	## Introduction
	We introduce MultiUI, a dataset containing 7.3 million samples from 1 million websites, covering diverse multi- modal tasks and UI layouts. Models trained on MultiUI not only excel in web UI tasks—achieving up to a 48% improvement on VisualWebBench and a 19.1% boost in action accuracy on a web agent dataset Mind2Web—but also generalize surprisingly well to non-web UI tasks and even to non-UI domains, such as document understanding, OCR, and chart interpretation.

	<video controls autoplay src="https://cdn-uploads.huggingface.co/production/uploads/65403d8781a8731a1c09a584/vk7yT4Y7ydBOHM6BojmlI.mp4"></video>

	## Model Performance

	![image/png](https://cdn-uploads.huggingface.co/production/uploads/65403d8781a8731a1c09a584/h1L7J4rLlq6EOtbiXZjZW.png)

	![image/png](https://cdn-uploads.huggingface.co/production/uploads/65403d8781a8731a1c09a584/NOVQ8WjgJoRm0bzN9zxFx.png)

	![image/png](https://cdn-uploads.huggingface.co/production/uploads/65403d8781a8731a1c09a584/O6GhR1UXOSi7o3yjXvK4e.png)

	## Contact
	* Junpeng Liu: jpliu@link.cuhk.edu.hk
	* Xiang Yue: xyue2@andrew.cmu.edu

	## Citation
	If you find this work helpful, please cite out paper: