Instructions to use xue-26/SAWM with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use xue-26/SAWM with Transformers:
# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("image-text-to-text", model="xue-26/SAWM") messages = [ { "role": "user", "content": [ {"type": "image", "url": "https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/p-blog/candy.JPG"}, {"type": "text", "text": "What animal is on the candy?"} ] }, ] pipe(text=messages)# Load model directly from transformers import AutoProcessor, AutoModelForMultimodalLM processor = AutoProcessor.from_pretrained("xue-26/SAWM") model = AutoModelForMultimodalLM.from_pretrained("xue-26/SAWM") messages = [ { "role": "user", "content": [ {"type": "image", "url": "https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/p-blog/candy.JPG"}, {"type": "text", "text": "What animal is on the candy?"} ] }, ] inputs = processor.apply_chat_template( messages, add_generation_prompt=True, tokenize=True, return_dict=True, return_tensors="pt", ).to(model.device) outputs = model.generate(**inputs, max_new_tokens=40) print(processor.decode(outputs[0][inputs["input_ids"].shape[-1]:])) - Notebooks
- Google Colab
- Kaggle
- Local Apps Settings
- vLLM
How to use xue-26/SAWM with vLLM:
Install from pip and serve model
# Install vLLM from pip: pip install vllm # Start the vLLM server: vllm serve "xue-26/SAWM" # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:8000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "xue-26/SAWM", "messages": [ { "role": "user", "content": [ { "type": "text", "text": "Describe this image in one sentence." }, { "type": "image_url", "image_url": { "url": "https://cdn.britannica.com/61/93061-050-99147DCE/Statue-of-Liberty-Island-New-York-Bay.jpg" } } ] } ] }'Use Docker
docker model run hf.co/xue-26/SAWM
- SGLang
How to use xue-26/SAWM with SGLang:
Install from pip and serve model
# Install SGLang from pip: pip install sglang # Start the SGLang server: python3 -m sglang.launch_server \ --model-path "xue-26/SAWM" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "xue-26/SAWM", "messages": [ { "role": "user", "content": [ { "type": "text", "text": "Describe this image in one sentence." }, { "type": "image_url", "image_url": { "url": "https://cdn.britannica.com/61/93061-050-99147DCE/Statue-of-Liberty-Island-New-York-Bay.jpg" } } ] } ] }'Use Docker images
docker run --gpus all \ --shm-size 32g \ -p 30000:30000 \ -v ~/.cache/huggingface:/root/.cache/huggingface \ --env "HF_TOKEN=<secret>" \ --ipc=host \ lmsysorg/sglang:latest \ python3 -m sglang.launch_server \ --model-path "xue-26/SAWM" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "xue-26/SAWM", "messages": [ { "role": "user", "content": [ { "type": "text", "text": "Describe this image in one sentence." }, { "type": "image_url", "image_url": { "url": "https://cdn.britannica.com/61/93061-050-99147DCE/Statue-of-Liberty-Island-New-York-Bay.jpg" } } ] } ] }' - Docker Model Runner
How to use xue-26/SAWM with Docker Model Runner:
docker model run hf.co/xue-26/SAWM
SeerGuard: A Safety Framework for Mobile GUI Agents via World Model Prediction
SAWM is the safety-augmented world model used in SeerGuard, a consequence-aware safety framework for mobile GUI agents. It is designed to assess whether a user instruction or a candidate GUI action may lead to unsafe consequences before the action is executed.
Unlike post-hoc safety checkers that evaluate an interaction after execution, SAWM performs proactive safety auditing. Given a mobile GUI screenshot, a user instruction, and a candidate action proposed by a GUI agent, SAWM predicts the semantic next state, evaluates the safety risk of the action, and provides a concise rationale. This allows SeerGuard to block unsafe actions before they cause irreversible effects such as unauthorized payments, privacy leakage, harmful message sending, data deletion, or unsafe device configuration changes.
This repository contains the SAWM model weights.
Model Description
SAWM is built upon Qwen3-VL-8B-Instruct and fine-tuned as a multimodal safety world model for mobile GUI environments.
It supports three core capabilities:
Instruction-Level Screening
SAWM determines whether a user instruction is explicitly malicious, unauthorized, or violates safety policies before the GUI agent starts interacting with the device.
Semantic Next-State Prediction
Given the current mobile screen and a candidate action, SAWM predicts the likely functional consequence of the action in natural language, instead of generating the next screen at the pixel level.
Action-Level Risk Assessment
Based on the predicted semantic consequence, SAWM classifies the candidate action as safe or unsafe and provides a rationale for the decision.
SAWM is intended to serve as the guard model inside SeerGuard, which combines coarse-grained instruction-level filtering with fine-grained action-level risk assessment.
Key Features
Consequence-Aware Safety Assessment
SAWM evaluates the safety of a candidate action by anticipating its likely outcome before execution. This is especially important for mobile GUI agents, where a single tap may trigger irreversible operations.
Semantic World Modeling
Instead of synthesizing future GUI screenshots, SAWM predicts the semantic next state of the interface. This reduces computational overhead while preserving the functional information needed for safety reasoning.
Mobile GUI Safety Alignment
SAWM is trained with safety-augmented mobile interaction data, allowing it to identify risks that only become apparent after grounding a seemingly benign instruction in the current GUI state.
Dual-Stage Guardrail Support
SAWM can be used for both:
- pre-execution instruction screening;
- runtime action-level safety auditing.
This makes it suitable for deployment as a guard model for different mobile GUI agents.
Framework Overview
In SeerGuard, SAWM is used in two stages.
First, the user instruction is screened before execution:
Input:
- User instruction
Output:
- Safety label: safe / unsafe
- Safety rationale
If the instruction is unsafe, the task is refused immediately.
Second, if the instruction is safe, the GUI agent proposes a candidate action. SAWM then evaluates the action before it is executed:
```text
Input:
- Current mobile GUI screenshot
- User instruction
- Candidate action
Output:
- Predicted semantic next state
- Action safety label: safe / unsafe
- Safety rationale
If the action is predicted to be unsafe, SeerGuard blocks the action and terminates the task. Otherwise, the action is allowed to proceed.
Intended Use
SAWM is intended for research and development of safety guardrails for mobile GUI agents.
Typical use cases include:
- proactive safety monitoring for mobile GUI agents;
- instruction-level malicious intent detection;
- action-level risk assessment before GUI execution;
- semantic next-state prediction for mobile interface transitions;
- evaluation of safety-utility trade-offs in autonomous mobile agents.
SAWM is not a standalone mobile agent. It is a guard model that should be combined with a GUI agent or an agent execution framework.
Model Architecture
SAWM uses Qwen3-VL-8B-Instruct as the backbone model and is trained under a unified autoregressive formulation.
The model takes multimodal GUI context as input and generates structured natural-language outputs. For action-level assessment, the model predicts:
{
"predicted_next_state": "...",
"safety_label": "safe" | "unsafe",
"rationale": "..."
}
This design allows the model to jointly learn:
- visual GUI understanding;
- action-consequence prediction;
- safety risk classification;
- natural-language safety rationale generation.
Training Data
SAWM is trained with a multi-task corpus that combines world-modeling data and safety augmentation data.
The training mixture includes:
MobileWorld Next-State QA Data
Used to provide basic mobile GUI world-modeling capability, including state-transition forecasting and action-consequence prediction.
General Textual Safety Data
Used to provide broad safety alignment and malicious instruction detection ability.
Multimodal Mobile Risk Data
Constructed from mobile GUI interaction trajectories with safety labels, semantic next-state descriptions, and rationales.
Synthetic Textual Mobile Risk Data
Generated to bridge the gap between general text-only safety data and visually grounded mobile GUI risk scenarios.
The final training corpus contains approximately 148K instances. The model is fine-tuned for 1 epoch with a learning rate of 1e-6.
Evaluation
SAWM is evaluated as the guard model in SeerGuard and as an independent safety world model.
SeerGuard Framework Evaluation
On MobileSafetyBench, SeerGuard improves the safety-utility trade-off across multiple GUI-agent backbones, including Qwen3-VL, GPT-5.1, and Gemini-3.1.
Instruction-Level Screening
SAWM is evaluated on Agent-SafetyBench and Prompt Injection benchmarks.
Action-Level Risk Assessment
On MobileRisk, SAWM achieves strong action-level risk detection performance. SAWM achieves the highest F1 score and Step Score among the compared methods, showing its ability to identify both unsafe trajectories and the onset step of risk.
Citation
If you find this model useful, please cite the SeerGuard paper:
@inproceedings{seerguard2026,
title={SeerGuard: A Safety Framework for Mobile GUI Agents via World Model Prediction},
author={Anonymous Authors},
year={2026}
}
SAWM is built on Qwen3-VL-8B-Instruct. Please also cite the corresponding Qwen-VL technical reports when using this model.
@misc{qwen3technicalreport,
title={Qwen3 Technical Report},
author={Qwen Team},
year={2025},
eprint={2505.09388},
archivePrefix={arXiv},
primaryClass={cs.CL},
url={https://arxiv.org/abs/2505.09388}
}
- Downloads last month
- -