Qwen3-VL-8B-Thinking-UI-MOPD-Student

This is the Student model from the UI-MOPD project — a cross-platform GUI agent trained via multi-teacher on-policy distillation for continual GUI agent learning.

Model Description

Qwen3-VL-8B-Thinking-UI-MOPD-Student is trained from Qwen3-VL-8B-Thinking using the UI-MOPD framework. It learns from two platform-specific 32B teachers (Desktop Teacher and Mobile Teacher) through reinforcement learning with platform-conditioned distillation, achieving balanced cross-platform performance on both desktop and mobile environments.

Key Highlights

  • Base Model: Qwen3-VL-8B-Thinking
  • Training Method: Multi-teacher on-policy distillation with DAPO + platform-conditioned KL regularization
  • Teachers: Qwen3-VL-32B-Thinking-Desktop-Teacher + Qwen3-VL-32B-Thinking-Mobile-Teacher
  • OSWorld Performance: 38.2% task success rate
  • MobileWorld Performance: 12.0% task success rate

Training Details

This model is obtained in Stage 2 of the UI-MOPD training pipeline:

  1. Stage 1: Supervised fine-tuning of Qwen3-VL-32B-Thinking on platform-specific data to produce a Desktop Teacher (46.3% on OSWorld) and a Mobile Teacher (16.2% on MobileWorld).
  2. Stage 2 (This Model): The 8B student is trained with reinforcement learning (DAPO) combined with multi-teacher on-policy distillation. A platform-conditioned router selects the appropriate teacher based on the current environment, and adaptive KL masking prevents over-regularization.

Key Training Components

  • Platform-Conditioned Routing: Routes each rollout to the corresponding platform-specific teacher
  • K3 Estimator: Efficient single-sample KL divergence estimator
  • Adaptive KL Masking: Removes teacher penalty when task reward is already sufficient

Performance

Method OSWorld (Desktop) MobileWorld (Mobile)
Qwen3-VL-8B-Thinking (base) 33.9% 7.7%
Mixed-SFT 35.0% 6.4%
Model Merge (TIES) 36.8% 0%
UI-MOPD (this model) 38.2% 12.0%

UI-MOPD achieves state-of-the-art balanced cross-platform performance, with +12.7% relative improvement on OSWorld and +55.8% on MobileWorld compared to the base model.

GUI Grounding & Understanding

Model AndroidControl ScreenSpot-Pro ScreenSpotV2 OSWorld-G
Qwen3-VL-8B-Thinking (base) 78.73% 43.71% 91.27% 52.13%
Model Merge (TIES) 74.01% 37.13% 88.60% 47.16%
UI-MOPD (this model) 80.05% 43.14% 90.88% 52.84%

UI-MOPD preserves GUI grounding and visual understanding capabilities while improving interactive task performance.

Intended Use

  • Cross-platform GUI agent for executing tasks on both desktop (e.g., web browsing, file management) and mobile (e.g., app navigation, settings control) environments
  • Research on continual learning and multi-platform adaptation for GUI agents

How to Use

from transformers import Qwen3VLForConditionalGeneration, AutoProcessor

model = Qwen3VLForConditionalGeneration.from_pretrained(
    "UI-MOPD/Qwen3-VL-8B-Thinking-UI-MOPD-Student",
    torch_dtype="auto",
    device_map="auto",
)
processor = AutoProcessor.from_pretrained("UI-MOPD/Qwen3-VL-8B-Thinking-UI-MOPD-Student")

Citation

@article{lian2025uimopd,
  title={UI-MOPD: Multi-platform On-Policy Distillation for Continual GUI Agent Learning},
  author={Lian, Niu and Chen, Alan and Yu, Zhehao and Duan, Chengzhen and Liu, Fazhan and Liu, Hui and Fu, Pei and Luan, Jian and Wang, Yaowei and Xia, Shu-Tao and Wang, Jinpeng},
  year={2025}
}

Related Resources

Downloads last month
-
Safetensors
Model size
9B params
Tensor type
BF16
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for UI-MOPD/Qwen3-VL-8B-Thinking-UI-MOPD-Student

Finetuned
(69)
this model