Instructions to use AmareshHebbar/radiology-coder-qwen25-3b with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- PEFT
How to use AmareshHebbar/radiology-coder-qwen25-3b with PEFT:
from peft import PeftModel from transformers import AutoModelForCausalLM base_model = AutoModelForCausalLM.from_pretrained("unsloth/qwen2.5-3b-instruct-unsloth-bnb-4bit") model = PeftModel.from_pretrained(base_model, "AmareshHebbar/radiology-coder-qwen25-3b") - Notebooks
- Google Colab
- Kaggle
- Local Apps Settings
- Unsloth Studio
How to use AmareshHebbar/radiology-coder-qwen25-3b with Unsloth Studio:
Install Unsloth Studio (macOS, Linux, WSL)
curl -fsSL https://unsloth.ai/install.sh | sh # Run unsloth studio unsloth studio -H 0.0.0.0 -p 8888 # Then open http://localhost:8888 in your browser # Search for AmareshHebbar/radiology-coder-qwen25-3b to start chatting
Install Unsloth Studio (Windows)
irm https://unsloth.ai/install.ps1 | iex # Run unsloth studio unsloth studio -H 0.0.0.0 -p 8888 # Then open http://localhost:8888 in your browser # Search for AmareshHebbar/radiology-coder-qwen25-3b to start chatting
Using HuggingFace Spaces for Unsloth
# No setup required # Open https://huggingface.co/spaces/unsloth/studio in your browser # Search for AmareshHebbar/radiology-coder-qwen25-3b to start chatting
Load model with FastModel
pip install unsloth from unsloth import FastModel model, tokenizer = FastModel.from_pretrained( model_name="AmareshHebbar/radiology-coder-qwen25-3b", max_seq_length=2048, )
🩻 Radiology Report Coder
Qwen2.5-3B fine-tuned for radiology report coder
Part of the Medical AI Fine-tuned Model Suite — 16 specialist models, one per task
TL;DR
Extracts ICD-10-CM diagnosis codes directly from radiology report impressions.
INPUT: IMPRESSION: 1.8cm hypoechoic nodule right thyroid lobe, TIRADS 4. Recommend FNA.
OUTPUT: ICD-10 Codes:\n1. E04.1 — Non-toxic single thyroid nodule (primary finding)\nRecommend: FNA if TIRADS ≥4. Add malignancy code post-biopsy.
| Base model | unsloth/Qwen2.5-3B-Instruct |
| Method | QLoRA, 4-bit NF4, rank 16 |
| Training data | radiology-coder-sft — 25,090 real-world rows |
| Training compute | NVIDIA A40 (48GB), ~1.3h |
| License | Apache 2.0 |
Architecture
+-------------------------+
user prompt --> | Qwen2.5-3B-Instruct | --> base weights (frozen, 4-bit NF4)
| + LoRA adapter (r=16) | --> radiology-coder-qwen25-3b
+-------------------------+
|
v
structured output
(code / JSON / classification)
This repo contains only the LoRA adapter (~60MB), not the full merged weights. Load it on top of the base model as shown below — this keeps the download small and lets you swap adapters on one base model in memory.
Intended use
Automate radiology coding workflows, integrate into RIS/PACS systems.
Direct use
Paste a radiology impression, get the candidate ICD-10-CM codes for each finding.
Downstream use
Pre-populate a coder's worklist in a RIS, or flag reports for teleradiology coding review.
Out of scope
Interpreting actual imaging data (DICOM, pixel data) — this model only reads the already-dictated text report, it does not analyze images.
This model is not a substitute for a certified medical professional's judgment. Output should be reviewed by a qualified person before being used in a clinical or billing decision.
Quickstart
Option A — Transformers + PEFT
from transformers import AutoModelForCausalLM, AutoTokenizer
from peft import PeftModel
import torch
base_model = "unsloth/Qwen2.5-3B-Instruct"
adapter = "AmareshHebbar/radiology-coder-qwen25-3b"
tokenizer = AutoTokenizer.from_pretrained(base_model)
model = AutoModelForCausalLM.from_pretrained(
base_model,
torch_dtype=torch.bfloat16,
device_map="auto",
)
model = PeftModel.from_pretrained(model, adapter)
messages = [
{"role": "system", "content": "You are a radiology coding specialist. Given a radiology report impression, return the ICD-10-CM codes for all findings."},
{"role": "user", "content": "IMPRESSION: 1.8cm hypoechoic nodule right thyroid lobe, TIRADS 4. Recommend FNA."},
]
inputs = tokenizer.apply_chat_template(messages, return_tensors="pt", add_generation_prompt=True).to(model.device)
outputs = model.generate(inputs, max_new_tokens=128, temperature=0.1, do_sample=True)
print(tokenizer.decode(outputs[0][inputs.shape[1]:], skip_special_tokens=True))
Expected output:
ICD-10 Codes:\n1. E04.1 — Non-toxic single thyroid nodule (primary finding)\nRecommend: FNA if TIRADS ≥4. Add malignancy code post-biopsy.
Option B — Unsloth (2x faster load + inference)
from unsloth import FastLanguageModel
model, tokenizer = FastLanguageModel.from_pretrained(
model_name="AmareshHebbar/radiology-coder-qwen25-3b",
max_seq_length=512,
load_in_4bit=True,
)
FastLanguageModel.for_inference(model)
messages = [
{"role": "system", "content": "You are a radiology coding specialist. Given a radiology report impression, return the ICD-10-CM codes for all findings."},
{"role": "user", "content": "IMPRESSION: Acute pulmonary embolism involving bilateral main pulmonary arteries. No right heart strain."},
]
prompt = tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
inputs = tokenizer(prompt, return_tensors="pt").to("cuda")
outputs = model.generate(**inputs, max_new_tokens=128, temperature=0.1, do_sample=True)
print(tokenizer.decode(outputs[0][inputs["input_ids"].shape[1]:], skip_special_tokens=True))
Option C — vLLM (production serving, OpenAI-compatible)
vllm serve unsloth/Qwen2.5-3B-Instruct \
--enable-lora \
--lora-modules radiology-coder-qwen25-3b=AmareshHebbar/radiology-coder-qwen25-3b \
--host 0.0.0.0 --port 8000 --dtype bfloat16
from openai import OpenAI
client = OpenAI(base_url="http://localhost:8000/v1", api_key="not-needed")
response = client.chat.completions.create(
model="radiology-coder-qwen25-3b",
messages=[
{"role": "system", "content": "You are a radiology coding specialist. Given a radiology report impression, return the ICD-10-CM codes for all findings."},
{"role": "user", "content": "IMPRESSION: 3.2cm hypodense lesion right hepatic lobe, arterial enhancement with washout, consistent with hepatocellular carcinoma."},
],
temperature=0.1,
)
print(response.choices[0].message.content)
Option D — GGUF / llama.cpp (CPU / edge inference)
This repo ships LoRA adapter weights, not a pre-merged GGUF. To run on llama.cpp, merge first:
pip install unsloth
python -c "
from unsloth import FastLanguageModel
model, tokenizer = FastLanguageModel.from_pretrained('AmareshHebbar/radiology-coder-qwen25-3b', load_in_4bit=False)
model.save_pretrained_gguf('radiology-coder-qwen25-3b-gguf', tokenizer, quantization_method='q4_k_m')
"
Training details
Data
Trained on 25,090 examples extracted from 25k radiology-relevant clinical notes filtered by imaging keywords (source). No synthetic or LLM-generated training data — every example pairs real-world input with its authoritative output.
| Split | Rows |
|---|---|
| Train | 20,072 |
| Validation | 2,509 |
| Test | 2,509 |
Full extraction pipeline documented on the dataset card.
Hyperparameters
| Parameter | Value |
|---|---|
| LoRA rank (r) | 16 |
| LoRA alpha | 32 |
| LoRA dropout | 0 |
| Target modules | q_proj, k_proj, v_proj, o_proj, gate_proj, up_proj, down_proj |
| Quantization | 4-bit NF4 (QLoRA) |
| Max sequence length | 512 |
| Optimizer | paged_adamw_8bit |
| LR schedule | 2e-4, cosine |
| Gradient checkpointing | Unsloth (smart offload) |
Training compute
| GPU | NVIDIA A40 (48GB) |
| Cloud provider | RunPod |
| Training time | ~1.3h (incl. eval + hub push) |
| Tracking | W&B run |
| CO2 estimate | self-reported, not measured with a carbon tracker — treat as approximate |
Fine-tuned with Unsloth for 2x faster training and reduced VRAM, using TRL's SFTTrainer. Full project: wandb.ai/amareshhebbar-/axiomapper.
Bias, risks & limitations
Data recency. Training data reflects a specific snapshot in time (CMS FY2026 / dataset publish date). Codes, rates, and rules referenced may become outdated as source authorities issue updates — always cross-check against the live authoritative source before high-stakes use.
Failure mode. Like any LLM, this model can produce a plausible-sounding but incorrect output, especially on rare, ambiguous, or highly compound real-world cases that fall outside the training distribution. It does not know when it's wrong.
Language. English-language input only (Hindi-medical model excepted, where Hindi system prompts are used but underlying clinical reasoning data is largely English-sourced).
Not a regulated medical device. This model has not been validated, cleared, or approved by any regulatory body (FDA, CDSCO, or equivalent) as a medical device or clinical decision support tool. It is a research/engineering artifact.
Misapplication risk. Do not use this model as the sole basis for a clinical, billing, or compliance decision affecting a real patient or claim. Do not deploy in an emergency triage context without a human-in-the-loop and clear escalation paths.
FAQ
Q: Can I merge the adapter into the base model for faster inference?
Yes — use model.merge_and_unload() after loading with PEFT, or use Unsloth's save_pretrained_merged() method.
Q: Why QLoRA instead of full fine-tuning? The base model already has strong language and medical knowledge from pretraining. QLoRA adapts only ~0.5-1% of parameters, which is enough to specialize the output format and domain without the cost or overfitting risk of full fine-tuning.
Q: Can I fine-tune this further on my own data? Yes, this adapter can be used as a starting checkpoint for continued fine-tuning. Note this may require merging first depending on your training framework.
Q: Why is the output format so strict? Each task was trained on a fixed system prompt and consistent output structure. Following the documented system prompt closely (see Quickstart above) gives the most reliable results — deviating from it may produce inconsistent formatting.
Q: Does this model store or transmit my input data? No. Like any open-weight model, all inference happens locally on your own infrastructure (or wherever you deploy it) — nothing is sent back to the model author.
Troubleshooting
| Symptom | Likely cause | Fix |
|---|---|---|
ValueError: padding_token not set |
Base tokenizer has no pad token | Set tokenizer.pad_token = tokenizer.eos_token before inference |
| Garbled / repeated output | Wrong chat template applied | Make sure you use tokenizer.apply_chat_template, not a raw string prompt |
| CUDA OOM on load | Insufficient VRAM | Use load_in_4bit=True (already default above) or reduce max_seq_length |
| Adapter loads but ignores fine-tuning | Base model mismatch | Confirm you loaded the exact base listed above — adapters are not portable across different base models or quantizations |
Related models in this suite
| Model | Task | Size |
|---|---|---|
| icd10-coder-qwen25-7b | ICD-10-CM medical coding | 7B |
| snomed-mapper-qwen25-7b | Clinical concept mapping | 7B |
| icd10-to-drg-qwen25-1b | ICD-10 to DRG reimbursement | 1.5B |
| pmjay-classifier-qwen25-3b | India PM-JAY classification | 3B |
Full suite overview: AmareshHebbar/medical-ai-model-suite
Changelog
| Version | Date | Notes |
|---|---|---|
| v1.0 | 2026 | Initial release — QLoRA fine-tune on 25,090 real-world rows |
Citation
@misc{medicalai2026,
author = {Hebbar, Amaresh},
title = {Medical AI Fine-tuning Suite},
year = {2026},
publisher = {HuggingFace},
url = {https://huggingface.co/AmareshHebbar}
}
Contact
- Downloads last month
- -