SeerGuard: A Safety Framework for Mobile GUI Agents via World Model Prediction

SAWM is the safety-augmented world model used in SeerGuard, a consequence-aware safety framework for mobile GUI agents. It is designed to assess whether a user instruction or a candidate GUI action may lead to unsafe consequences before the action is executed.

Unlike post-hoc safety checkers that evaluate an interaction after execution, SAWM performs proactive safety auditing. Given a mobile GUI screenshot, a user instruction, and a candidate action proposed by a GUI agent, SAWM predicts the semantic next state, evaluates the safety risk of the action, and provides a concise rationale. This allows SeerGuard to block unsafe actions before they cause irreversible effects such as unauthorized payments, privacy leakage, harmful message sending, data deletion, or unsafe device configuration changes.

This repository contains the SAWM model weights.

Model Description

Chat

SAWM is built upon Qwen3-VL-8B-Instruct and fine-tuned as a multimodal safety world model for mobile GUI environments.

It supports three core capabilities:

  1. Instruction-Level Screening

    SAWM determines whether a user instruction is explicitly malicious, unauthorized, or violates safety policies before the GUI agent starts interacting with the device.

  2. Semantic Next-State Prediction

    Given the current mobile screen and a candidate action, SAWM predicts the likely functional consequence of the action in natural language, instead of generating the next screen at the pixel level.

  3. Action-Level Risk Assessment

    Based on the predicted semantic consequence, SAWM classifies the candidate action as safe or unsafe and provides a rationale for the decision.

SAWM is intended to serve as the guard model inside SeerGuard, which combines coarse-grained instruction-level filtering with fine-grained action-level risk assessment.

Key Features

Consequence-Aware Safety Assessment

SAWM evaluates the safety of a candidate action by anticipating its likely outcome before execution. This is especially important for mobile GUI agents, where a single tap may trigger irreversible operations.

Semantic World Modeling

Instead of synthesizing future GUI screenshots, SAWM predicts the semantic next state of the interface. This reduces computational overhead while preserving the functional information needed for safety reasoning.

Mobile GUI Safety Alignment

SAWM is trained with safety-augmented mobile interaction data, allowing it to identify risks that only become apparent after grounding a seemingly benign instruction in the current GUI state.

Dual-Stage Guardrail Support

SAWM can be used for both:

  • pre-execution instruction screening;
  • runtime action-level safety auditing.

This makes it suitable for deployment as a guard model for different mobile GUI agents.

Framework Overview

In SeerGuard, SAWM is used in two stages.

First, the user instruction is screened before execution:

Input:
- User instruction

Output:
- Safety label: safe / unsafe
- Safety rationale

If the instruction is unsafe, the task is refused immediately.

Second, if the instruction is safe, the GUI agent proposes a candidate action. SAWM then evaluates the action before it is executed:

```text
Input:
- Current mobile GUI screenshot
- User instruction
- Candidate action

Output:
- Predicted semantic next state
- Action safety label: safe / unsafe
- Safety rationale

If the action is predicted to be unsafe, SeerGuard blocks the action and terminates the task. Otherwise, the action is allowed to proceed.

Intended Use

SAWM is intended for research and development of safety guardrails for mobile GUI agents.

Typical use cases include:

  • proactive safety monitoring for mobile GUI agents;
  • instruction-level malicious intent detection;
  • action-level risk assessment before GUI execution;
  • semantic next-state prediction for mobile interface transitions;
  • evaluation of safety-utility trade-offs in autonomous mobile agents.

SAWM is not a standalone mobile agent. It is a guard model that should be combined with a GUI agent or an agent execution framework.

Model Architecture

SAWM uses Qwen3-VL-8B-Instruct as the backbone model and is trained under a unified autoregressive formulation.

The model takes multimodal GUI context as input and generates structured natural-language outputs. For action-level assessment, the model predicts:

{
  "predicted_next_state": "...",
  "safety_label": "safe" | "unsafe",
  "rationale": "..."
}

This design allows the model to jointly learn:

  • visual GUI understanding;
  • action-consequence prediction;
  • safety risk classification;
  • natural-language safety rationale generation.

Training Data

SAWM is trained with a multi-task corpus that combines world-modeling data and safety augmentation data.

The training mixture includes:

  1. MobileWorld Next-State QA Data

    Used to provide basic mobile GUI world-modeling capability, including state-transition forecasting and action-consequence prediction.

  2. General Textual Safety Data

    Used to provide broad safety alignment and malicious instruction detection ability.

  3. Multimodal Mobile Risk Data

    Constructed from mobile GUI interaction trajectories with safety labels, semantic next-state descriptions, and rationales.

  4. Synthetic Textual Mobile Risk Data

    Generated to bridge the gap between general text-only safety data and visually grounded mobile GUI risk scenarios.

The final training corpus contains approximately 148K instances. The model is fine-tuned for 1 epoch with a learning rate of 1e-6.

Evaluation

SAWM is evaluated as the guard model in SeerGuard and as an independent safety world model.

SeerGuard Framework Evaluation

On MobileSafetyBench, SeerGuard improves the safety-utility trade-off across multiple GUI-agent backbones, including Qwen3-VL, GPT-5.1, and Gemini-3.1.

Chat

Instruction-Level Screening

SAWM is evaluated on Agent-SafetyBench and Prompt Injection benchmarks.

Chat

Action-Level Risk Assessment

On MobileRisk, SAWM achieves strong action-level risk detection performance. SAWM achieves the highest F1 score and Step Score among the compared methods, showing its ability to identify both unsafe trajectories and the onset step of risk.

Chat

Citation

If you find this model useful, please cite the SeerGuard paper:

@inproceedings{seerguard2026,
  title={SeerGuard: A Safety Framework for Mobile GUI Agents via World Model Prediction},
  author={Anonymous Authors},
  year={2026}
}

SAWM is built on Qwen3-VL-8B-Instruct. Please also cite the corresponding Qwen-VL technical reports when using this model.

@misc{qwen3technicalreport,
  title={Qwen3 Technical Report},
  author={Qwen Team},
  year={2025},
  eprint={2505.09388},
  archivePrefix={arXiv},
  primaryClass={cs.CL},
  url={https://arxiv.org/abs/2505.09388}
}

Downloads last month
-
Safetensors
Model size
770k params
Tensor type
BF16
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Paper for xue-26/SAWM