UI-MOPD/Qwen3-VL-8B-Thinking-Desktop-SFT
Image-Text-to-Text • 9B • Updated
UI-MOPD
We build cross-platform GUI agents that can operate both desktop and mobile interfaces through a unified training framework.
UI-MOPD introduces a two-stage training pipeline:
Our student model (8B) learns from multiple 32B teacher models to achieve strong cross-platform GUI interaction capabilities.
| Model | Size | Description |
|---|---|---|
| Qwen3-VL-32B-Thinking-Desktop-Teacher | 33B | Desktop platform teacher |
| Qwen3-VL-32B-Thinking-Mobile-Teacher | 33B | Mobile platform teacher |
| Qwen3-VL-8B-Thinking-Desktop-SFT | 9B | Desktop SFT checkpoint |
| Qwen3-VL-8B-Thinking-Mobile-SFT | 9B | Mobile SFT checkpoint |
| Qwen3-VL-8B-Thinking-UI-MOPD-Student | 9B | Final cross-platform student |
| Dataset | Description |
|---|---|
| Uni-GUI-OpenCUA | Post-processed desktop trajectories from OpenCUA (~832 episodes, ~14K steps) |
| Uni-GUI-Desktop-1 | Large-scale desktop GUI trajectories (~2.7K episodes, ~36K steps) |