--- license: apache-2.0 base_model: - OpenGVLab/InternVL2-2B --- ## SpiritSight Agent: Advanced GUI Agent with One Look
π Paper β’ π€ Models β’ π Datasets (Coming soonβ¦)
## Introduction SpiritSight-Agent is a vision-based, end-to-end GUI agent that excels in GUI navigation tasks across various GUI platforms.   ## Models We recommend fine-tuning the base model on custom data. | Model | Checkpoint | Size | License| |:-------|:------------|:------|:--------| | SpiritSight-Agent-2B-base | π€ [HF Link](https://huggingface.co/SenseLLM/SpiritSight-Agent-2B) | 2B | [InternVL](https://github.com/OpenGVLab/InternVL/blob/main/LICENSE) | | SpiritSight-Agent-8B-base | π€ [HF Link](https://huggingface.co/SenseLLM/SpiritSight-Agent-8B) | 8B | [InternVL](https://github.com/OpenGVLab/InternVL/blob/main/LICENSE) | | SpiritSight-Agent-26B-base | π€ [HF Link](https://huggingface.co/SenseLLM/SpiritSight-Agent-26B) | 26B | [InternVL](https://github.com/OpenGVLab/InternVL/blob/main/LICENSE) | ## Datasets Coming soon. ## Inference ```shell conda create -n spiritsight-agent python=3.9 pip install -r requirements.txt pip install flash-attn==2.3.6 --no-build-isolation python infer_SSAgent-2B.py ``` ## Citation If you find this repo useful for your research, please kindly cite our paper: ``` @misc{huang2025spiritsightagentadvancedgui, title={SpiritSight Agent: Advanced GUI Agent with One Look}, author={Zhiyuan Huang and Ziming Cheng and Junting Pan and Zhaohui Hou and Mingjie Zhan}, year={2025}, eprint={2503.03196}, archivePrefix={arXiv}, primaryClass={cs.CV}, url={https://arxiv.org/abs/2503.03196}, } ``` ## Acknowledgments We thank the following amazing projects that truly inspired us: - [InternVL2](https://huggingface.co/OpenGVLab/InternVL2-8B) - [SeeClick]( https://github.com/njucckevin/SeeClick) - [Mind2Web](https://huggingface.co/datasets/osunlp/Multimodal-Mind2Web) - [GUI-Odyssey](https://github.com/OpenGVLab/GUI-Odyssey) - [AMEX](https://huggingface.co/datasets/Yuxiang007/AMEX) - [AndroidControl](https://github.com/google-research/google-research/tree/master/android_control) - [GUICourse](https://github.com/yiye3/GUICourse)