GoClick Model
This is the on-device GUI element detection model for the GoClick Android App.
Model Description
GoClick is a Florence-2 based model that locates GUI elements on screen using natural language descriptions. The model runs completely on-device using ONNX Runtime.
Model Variants
This repository includes multiple quantized versions for different performance needs:
| Variant | Size | Use Case |
|---|---|---|
vision_encoder_int8.onnx |
91MB | Best balance of speed and accuracy |
encoder_model_int8.onnx |
80MB | Best balance of speed and accuracy |
decoder_model_int8.onnx |
132MB | Best balance of speed and accuracy |
vision_encoder_fp16.onnx |
176MB | Better accuracy, larger size |
encoder_model_fp16.onnx |
158MB | Better accuracy, larger size |
decoder_model_fp16.onnx |
261MB | Better accuracy, larger size |
vision_encoder.onnx |
350MB | Full precision (float32) |
encoder_model.onnx |
316MB | Full precision (float32) |
decoder_model.onnx |
521MB | Full precision (float32) |
Usage with GoClick App
- Download the model files (we recommend starting with the INT8 versions)
- Place them in
app/src/main/assets/directory of the GoClick Android project - Build and run the app
Required Files for App
Minimum required files to run the app:
vision_encoder_int8.onnxencoder_model_int8.onnxdecoder_model_int8.onnxvocab.jsontokenizer.jsontokenizer_config.jsonspecial_tokens_map.jsonmask.jpg
How It Works
- Screen Capture: The app captures the device screen using MediaProjection
- Vision Encoder: Processes the screenshot into visual features
- Text Encoder: Encodes the natural language query
- Decoder: Generates coordinate tokens for the target element location
- Post-processing: Converts tokens to actual screen coordinates
Model Architecture
Based on Florence-2, adapted for mobile deployment:
- Vision encoder for screen understanding
- Text encoder for natural language queries
- Auto-regressive decoder for coordinate generation
License
MIT License
Links
- Downloads last month
- 41
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support