Instructions to use CraneAILabs/cranemedai-safety with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use CraneAILabs/cranemedai-safety with Transformers:
# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("text-classification", model="CraneAILabs/cranemedai-safety")# Load model directly from transformers import AutoModel model = AutoModel.from_pretrained("CraneAILabs/cranemedai-safety", dtype="auto") - Notebooks
- Google Colab
- Kaggle
CraneMed AI Safety Classifier
An on-device, multilingual safety classifier for detecting adversarial prompts targeting clinical AI assistants operating under the Uganda Clinical Guidelines (UCG) 2023.
Fine-tuned from paraphrase-multilingual-MiniLM-L12-v2 on the UCG Adversarial Safety Dataset (3,020 labeled prompts across 8 attack categories). Designed for Android deployment at β€40 MB and <30ms latency.
Model Description
This classifier is the L3 (neural) layer in the CraneMed AI safety architecture β a multi-layered defense system for a clinical decision support tool running MedGemma on Android in low-connectivity Ugandan health facilities.
The safety architecture has four layers:
- L1 β Regex Input Filter (~0 MB, <1ms): Pattern-based blocking of known jailbreaks, LD50 queries, roleplay attacks
- L3 β This Classifier (~12 MB INT8, <30ms): Neural classifier for nuanced adversarial detection
- L4 β MedGemma Self-Check (0 MB, ~300ms): Fires only in borderline zone using the existing model
- L2 β Regex Output Validator (~0 MB, <1ms): Post-generation checks for harmful leakage
The classifier catches attacks that evade regex patterns: authority impersonation, escalation chains, contraindication bypass with clinical framing, and multilingual attacks.
Files
| Path | Description | Size |
|---|---|---|
best_model/model.safetensors |
PyTorch checkpoint (safetensors format) | ~90 MB |
best_model/config.json |
Model configuration | β |
best_model/tokenizer.json |
Fast tokenizer | β |
best_model/tokenizer_config.json |
Tokenizer configuration | β |
best_model/special_tokens_map.json |
Special tokens mapping | β |
onnx/cranemed_safety_fp32.onnx |
Full precision ONNX export | ~45 MB |
onnx/cranemed_safety_int8.onnx |
INT8 quantized ONNX (Android deployment) | ~12 MB |
onnx/export_meta.json |
Export metadata and validation results | β |
Intended Use
- Primary: On-device adversarial prompt filtering for clinical AI assistants in Ugandan health facilities
- Secondary: Red-team evaluation of clinical LLMs against Uganda-specific adversarial attacks
- Deployment target: Android 8 GB devices, ONNX Runtime, INT8 quantization
Attack Categories Detected
| Category | Description |
|---|---|
| Dangerous Dosing | Toxic thresholds, lethal dose stacking requests |
| Authority Impersonation | MOH officer framing, academic authority claims |
| Contraindication Bypass | Stock-out exploitation, urgency-based safety override |
| UCG Context Exploitation | VHT scope creep, bed pressure, out-of-scope roles |
| Emergency Misdirection | Drug substitution under fabricated emergencies |
| Hallucination Probing | Fake UCG sections, fictional drugs |
| Jailbreak / Roleplay | Fictional doctor personas, persona injection |
| Multi-Turn Escalation | Benign context followed by adversarial escalation |
Languages
Supports prompts in English, Luganda, and Swahili.
Training Data
Trained on the UCG Adversarial Safety Dataset β 3,020 labeled prompts (1,034 ADVERSARIAL / 1,986 SAFE) generated from UCG 2023 clinical mappings using Gemini 1.5 Flash.
Class imbalance is addressed via inverse-frequency weighted CrossEntropyLoss (~1.9x weight on the ADVERSARIAL class).
Target Metrics
| Metric | Target |
|---|---|
| Precision (ADVERSARIAL) | > 0.92 |
| F1 Macro | > 0.88 |
| Avg Latency (INT8, on-device) | < 30ms |
| Model Size (INT8) | β€ 40 MB |
| Accuracy Degradation FP32 β INT8 | < 2% |
Android Deployment
1. Copy cranemed_safety_int8.onnx β app/src/main/assets/
2. Copy tokenizer files β app/src/main/assets/tokenizer/
3. Use OnnxSafetyClassifier.kt for inference
4. Integrate with SafetyGate.kt in the MedGemma pipeline
Citation
@misc{cranemedai_safety_classifier,
author = {Crane AI Labs},
title = {CraneMed AI Safety Classifier: On-Device Adversarial Prompt Detection for Uganda Clinical Guidelines AI},
year = {2026},
publisher = {Hugging Face},
journal = {Hugging Face Repository},
howpublished = {\url{https://huggingface.co/CraneAILabs/cranemedai-safety}}
}