Instructions to use Vionex-digital/Ar-CXR with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use Vionex-digital/Ar-CXR with Transformers:
# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("image-text-to-text", model="Vionex-digital/Ar-CXR", trust_remote_code=True)# Load model directly from transformers import AutoModel model = AutoModel.from_pretrained("Vionex-digital/Ar-CXR", trust_remote_code=True, dtype="auto") - Notebooks
- Google Colab
- Kaggle
- Local Apps Settings
- vLLM
How to use Vionex-digital/Ar-CXR with vLLM:
Install from pip and serve model
# Install vLLM from pip: pip install vllm # Start the vLLM server: vllm serve "Vionex-digital/Ar-CXR" # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:8000/v1/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "Vionex-digital/Ar-CXR", "prompt": "Once upon a time,", "max_tokens": 512, "temperature": 0.5 }'Use Docker
docker model run hf.co/Vionex-digital/Ar-CXR
- SGLang
How to use Vionex-digital/Ar-CXR with SGLang:
Install from pip and serve model
# Install SGLang from pip: pip install sglang # Start the SGLang server: python3 -m sglang.launch_server \ --model-path "Vionex-digital/Ar-CXR" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "Vionex-digital/Ar-CXR", "prompt": "Once upon a time,", "max_tokens": 512, "temperature": 0.5 }'Use Docker images
docker run --gpus all \ --shm-size 32g \ -p 30000:30000 \ -v ~/.cache/huggingface:/root/.cache/huggingface \ --env "HF_TOKEN=<secret>" \ --ipc=host \ lmsysorg/sglang:latest \ python3 -m sglang.launch_server \ --model-path "Vionex-digital/Ar-CXR" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "Vionex-digital/Ar-CXR", "prompt": "Once upon a time,", "max_tokens": 512, "temperature": 0.5 }' - Docker Model Runner
How to use Vionex-digital/Ar-CXR with Docker Model Runner:
docker model run hf.co/Vionex-digital/Ar-CXR
أشعة · Ar-CXR · Arabic Chest X-ray Vision–Language Model · by Vionex Digital Solutions
The first chest X-ray VLM that generates native Arabic radiology reports
⚕️ MEDICAL RESEARCH PROTOTYPE — NOT A MEDICAL DEVICE. Ar-CXR is released for research only. It must not be used for clinical decision-making, diagnosis, triage, or any patient-facing purpose.
Ar-CXR is the first chest X-ray vision–language model that generates native Arabic radiology reports. It couples a frozen RAD-DINO image encoder with a Falcon-H1-7B Arabic-capable decoder through a feature-preserving MLP connector, with low-rank adaptation (LoRA) of the vision encoder to break the grounding ceiling. It is trained on CheXpert-Plus with reports machine-translated to Modern Standard Arabic.
- Language: Arabic (Modern Standard Arabic)
- Model type: Multimodal vision–language model (image + text → Arabic report) + auxiliary CXR grounding head
- Finetuned from:
microsoft/rad-dino-maira-2(vision) +tiiuae/Falcon-H1-7B-Instruct(decoder) - License: Composite, research-only — see License
This repository ships trained adapters only (the deltas we are allowed to redistribute), not the base-model weights. The first run downloads RAD-DINO and Falcon-H1 from their own repositories under their own licenses. See How to use.
What's in this repository
| Path | Contents | Used by |
|---|---|---|
weights/generation/ |
BLIP-2 Q-Former (64 queries), proj (768→3072), prefix LayerNorm | generate_report() |
weights/decoder_lora/ |
LoRA (r=64, α=128) adapters for Falcon-H1 | generate_report() |
weights/connector/ |
MLP connector (768→3072→3072), prefix LayerNorm, 11-way aux grounding head | predict_findings() |
weights/vision_lora/ |
LoRA (r=16, α=32) adapters for RAD-DINO | predict_findings() |
config.json |
Full Ar-CXR composite configuration | both |
generation_config.json |
Decoding settings used in the paper | |
modeling_ar_cxr.py |
Reference inference code (assembles base models + adapters) | |
results/ |
The exact evaluation JSONs behind every number below |
Architecture
Ar-CXR is two trained configurations that share the two base models but use different visual connectors. They are loaded and run independently — the connectors are not interchangeable.
- Generation uses a BLIP-2 Q-Former connector over the frozen encoder; its decoder-LoRA was trained (section-masked ITG, with prefix-LayerNorm and the fixed Arabic instruction) to read the 64-token Q-Former prefix. This is the configuration behind every report-generation number below.
- Grounding uses an MLP connector + vision-LoRA; the 11-way aux head sits on the MLP output, so its gradient flows into the connector and (via LoRA) the encoder — it is not an inert probe. This is the configuration behind every AUROC below.
⚠️ The two connectors are architecturally distinct: the generation decoder-LoRA reads the 64-token Q-Former prefix, while the grounding head reads the 257-token MLP prefix. Do not feed one connector's prefix to the other's head/decoder.
The central grounding finding: connector design and a vision LoRA, not decoder scale, govern grounding. A 64-query Q-Former connector caps the grounding macro-AUROC at 0.667; an MLP connector lifts the frozen-feature ceiling to 0.730; vision-LoRA breaks it to 0.789.
Results
All numbers come straight from the JSONs in results/. No number here is estimated.
1. Visual grounding — connector ablation (macro-AUROC of the 11-finding head)
| Connector | macro-AUROC |
|---|---|
| Raw RAD-DINO (linear probe) | 0.613 |
| Q-Former, 64 queries (BLIP-2 default) | 0.667 |
| MLP connector (frozen encoder) | 0.730 |
| MLP + vision-LoRA (this model) | 0.789 |
Held-out test (n=10,810): 0.7895 (95% CI [0.785, 0.794]). External Stanford holdout (n=233): 0.7864. The two agree within CI.
2. Identical-protocol comparison vs TorchXRayVision (same images, same gold labels, 9 shared findings)
| TorchXRayVision DenseNet | Ar-CXR | Δ | |
|---|---|---|---|
| macro (9 findings) | 0.669 | 0.768 | +0.099 |
| fracture | 0.476 | 0.697 | +0.221 |
| pneumonia | 0.575 | 0.739 | +0.164 |
| pneumothorax | 0.701 | 0.859 | +0.158 |
| cardiomegaly | 0.711 | 0.823 | +0.112 |
| effusion | 0.788 | 0.885 | +0.096 |
Read conservatively: TXV is evaluated zero-shot under domain shift against a CheXbert-on-impression label definition it was not trained on. The defensible claim is that Ar-CXR's grounding beats a widely used off-the-shelf classifier on this protocol, not that it beats supervised classifiers in general.
3. Arabic report generation vs open VLM baselines (n=200, image-only, identical Arabic instruction)
| Model | METEOR | chrF | BERTScore-F1 | CIDEr | Clinical Jaccard |
|---|---|---|---|---|---|
| Ar-CXR (ours) | 19.2 | 29.2 | 61.6 | 0.21 | 40.4 |
| Lingshu-7B (CXR specialist) | 5.5 | 22.1 | 56.0 | 0.05 | 6.8 |
| AIN-7B (Arabic VLM) | 7.6 | 25.1 | 53.6 | 0.09 | 17.3 |
| Qwen2.5-VL-7B | 5.6 | 22.2 | 51.9 | 0.05 | 8.7 |
| IDEFICS2-8B (EN→AR) | 4.5 | 15.4 | 51.2 | 0.06 | 4.5 |
Ar-CXR ranks first on every automatic metric. Note the clinical Jaccard gap (40.4 vs ≤17.3): the baselines — even Lingshu, a CXR specialist — produce fluent text but miss the Arabic finding vocabulary (Jaccard ≤17.3).
How to use
Requires accepting the base-model licenses on the Hub (
tiiuae/Falcon-H1-7B-Instruct,microsoft/rad-dino-maira-2) and a GPU (~18 GB VRAM in bf16).
import torch
from huggingface_hub import snapshot_download
from modeling_ar_cxr import ArCXR # ships in this repo
repo = snapshot_download("Vionex-digital/Ar-CXR")
model = ArCXR.from_pretrained_adapters(repo, device="cuda", dtype=torch.bfloat16)
from PIL import Image
image = Image.open("chest_xray.png").convert("RGB")
# 1) Generate an Arabic report
report = model.generate_report(image)
print(report)
# 2) Grounding: per-finding probabilities (research diagnostic, not a classifier)
print(model.predict_findings(image)) # {'effusion': 0.88, 'cardiomegaly': 0.82, ...}
The Arabic instruction used in training/eval (baked into generate_report, no need to pass it) is:
اكتب تقرير أشعة صدر باللغة العربية بناءً على الصورة:
Decoding: greedy, repetition_penalty=1.3, no_repeat_ngram_size=3, max_new_tokens=200.
The reported metrics use this greedy configuration. Generation is deterministic within a fixed
environment, but greedy decoding is sensitive at near-ties, so reports may differ by a few tokens
(into clinically-equivalent phrasings) across GPUs/driver/library versions — this is normal LLM
behaviour, not a sign of a load error. The grounding head (predict_findings) is bitwise
reproducible.
Training data
- Source: CheXpert-Plus — 223,462 radiographs, 187,711 studies, 64,725 patients.
- Arabic reports: 221,247 reports machine-translated EN→Modern Standard Arabic. We do not redistribute the translated corpus (Stanford CheXpert-Plus data-use agreement).
- Splits: patient-level (seed 42), 90/5/5; the official CheXpert validation studies are an external "Stanford holdout".
- Gold labels: CheXbert run on each report's impression, mapped to 11 findings (positive-only).
License
This is a composite, research-only release. The redistributed adapters and code are released for non-commercial research; you must also comply with every upstream license, whichever is most restrictive:
| Component | Source | License |
|---|---|---|
| Decoder base | tiiuae/Falcon-H1-7B-Instruct |
TII Falcon-LLM License 2.0 |
| Vision base | microsoft/rad-dino-maira-2 |
MSR license (research use) |
| Training data | CheXpert-Plus | Stanford CheXpert-Plus Data Use Agreement |
See LICENSE.md and NOTICE.md. The model and its
outputs are not for clinical use.
Citation
@techreport{khaled2026arcxr,
title = {Ar-CXR: A Native Arabic Chest X-ray Vision--Language Model for
Radiology Report Generation and Visual Grounding},
institution = {Vionex Digital Solutions},
year = {2026}
}
AI-tool disclosure
Software-engineering and manuscript-preparation assistance was provided by an AI coding assistant under author supervision. All experiments, results, and claims were designed, executed, and verified by the authors.
- Downloads last month
- -
Model tree for Vionex-digital/Ar-CXR
Base model
microsoft/rad-dino-maira-2Evaluation results
- chrF on CheXpert-Plus (Arabic, machine-translated) — 200-image testself-reported29.200
- BERTScore-F1 (AraBERT-v02) on CheXpert-Plus (Arabic, machine-translated) — 200-image testself-reported61.600
- Arabic Clinical-term Jaccard on CheXpert-Plus (Arabic, machine-translated) — 200-image testself-reported40.400
- macro-AUROC (11 findings) on CheXpert-Plus patient-disjoint held-out test (n=10,810)self-reported0.789