Instructions to use AmareshHebbar/icd10-coder-qwen25-7b with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- PEFT
How to use AmareshHebbar/icd10-coder-qwen25-7b with PEFT:
from peft import PeftModel from transformers import AutoModelForCausalLM base_model = AutoModelForCausalLM.from_pretrained("unsloth/qwen2.5-7b-instruct-unsloth-bnb-4bit") model = PeftModel.from_pretrained(base_model, "AmareshHebbar/icd10-coder-qwen25-7b") - Notebooks
- Google Colab
- Kaggle
- Local Apps Settings
- Unsloth Studio
How to use AmareshHebbar/icd10-coder-qwen25-7b with Unsloth Studio:
Install Unsloth Studio (macOS, Linux, WSL)
curl -fsSL https://unsloth.ai/install.sh | sh # Run unsloth studio unsloth studio -H 0.0.0.0 -p 8888 # Then open http://localhost:8888 in your browser # Search for AmareshHebbar/icd10-coder-qwen25-7b to start chatting
Install Unsloth Studio (Windows)
irm https://unsloth.ai/install.ps1 | iex # Run unsloth studio unsloth studio -H 0.0.0.0 -p 8888 # Then open http://localhost:8888 in your browser # Search for AmareshHebbar/icd10-coder-qwen25-7b to start chatting
Using HuggingFace Spaces for Unsloth
# No setup required # Open https://huggingface.co/spaces/unsloth/studio in your browser # Search for AmareshHebbar/icd10-coder-qwen25-7b to start chatting
Load model with FastModel
pip install unsloth from unsloth import FastModel model, tokenizer = FastModel.from_pretrained( model_name="AmareshHebbar/icd10-coder-qwen25-7b", max_seq_length=2048, )
ICD-10 Coder — Qwen2.5-7B
Fine-tuned medical LLM for icd-10 coder, built by Amaresh Hebbar.
Given a clinical description, this model returns the most accurate ICD-10-CM code with a brief justification. It's a structured-output translation layer between free-text clinical language and the coding system insurance reimbursement runs on.
Model summary
| Base model | unsloth/Qwen2.5-7B-Instruct |
| Fine-tuning method | QLoRA (4-bit NF4), rank 16 |
| Training framework | Unsloth + TRL SFTTrainer |
| Training data | AmareshHebbar/icd10-coder-sft — 74,719 rows |
| Hardware | NVIDIA A40 (48GB), single GPU |
| License | Apache 2.0 |
Intended use
Drop-in coding assistant for clinical documentation pipelines, EHR systems, and medical billing software. Given a diagnosis description, the model outputs the matching ICD-10-CM code — nothing else needed downstream except a DRG lookup table if you need reimbursement estimates.
This model is not a substitute for a certified medical professional's judgment. Output should be reviewed by a qualified person before being used in a clinical or billing decision. The model can make mistakes, especially on rare or compound cases.
How to use
With 🤗 Transformers + PEFT
from transformers import AutoModelForCausalLM, AutoTokenizer
from peft import PeftModel
import torch
base_model = "unsloth/Qwen2.5-7B-Instruct"
adapter = "AmareshHebbar/icd10-coder-qwen25-7b"
tokenizer = AutoTokenizer.from_pretrained(base_model)
model = AutoModelForCausalLM.from_pretrained(
base_model,
torch_dtype=torch.bfloat16,
device_map="auto",
)
model = PeftModel.from_pretrained(model, adapter)
messages = [
{"role": "system", "content": "You are a certified medical coder. Given a clinical description or diagnosis, return the most accurate ICD-10-CM code with a brief justification."},
{"role": "user", "content": "Patient presents with uncontrolled type 2 diabetes with diabetic nephropathy."},
]
inputs = tokenizer.apply_chat_template(messages, return_tensors="pt", add_generation_prompt=True).to(model.device)
outputs = model.generate(inputs, max_new_tokens=128, temperature=0.1, do_sample=True)
print(tokenizer.decode(outputs[0][inputs.shape[1]:], skip_special_tokens=True))
Expected output:
E11.65 — Type 2 diabetes mellitus with hyperglycemia, combined with N08 for diabetic nephropathy. Primary code: E11.65, Secondary: N08.
With Unsloth (faster inference, recommended)
from unsloth import FastLanguageModel
model, tokenizer = FastLanguageModel.from_pretrained(
model_name="AmareshHebbar/icd10-coder-qwen25-7b",
max_seq_length=512,
load_in_4bit=True,
)
FastLanguageModel.for_inference(model)
messages = [
{"role": "system", "content": "You are a certified medical coder. Given a clinical description or diagnosis, return the most accurate ICD-10-CM code with a brief justification."},
{"role": "user", "content": "Acute ST-elevation myocardial infarction of the anterior wall, initial encounter."},
]
prompt = tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
inputs = tokenizer(prompt, return_tensors="pt").to("cuda")
outputs = model.generate(**inputs, max_new_tokens=128, temperature=0.1, do_sample=True)
print(tokenizer.decode(outputs[0][inputs["input_ids"].shape[1]:], skip_special_tokens=True))
With vLLM (production serving)
vllm serve unsloth/Qwen2.5-7B-Instruct \
--enable-lora \
--lora-modules icd10-coder-qwen25-7b=AmareshHebbar/icd10-coder-qwen25-7b \
--host 0.0.0.0 --port 8000 --dtype bfloat16
from openai import OpenAI
client = OpenAI(base_url="http://localhost:8000/v1", api_key="not-needed")
response = client.chat.completions.create(
model="icd10-coder-qwen25-7b",
messages=[
{"role": "system", "content": "You are a certified medical coder. Given a clinical description or diagnosis, return the most accurate ICD-10-CM code with a brief justification."},
{"role": "user", "content": "Community-acquired pneumonia due to Streptococcus pneumoniae."},
],
temperature=0.1,
)
print(response.choices[0].message.content)
Training details
Data
Trained on 74,719 examples extracted from real CMS FY2026 ICD-10-CM Tabular Order codes. No synthetic or LLM-generated training data — every example pairs real-world input with its authoritative output.
- Train: 59,775 examples
- Validation: 7,472 examples
- Test: 7,472 examples
See the dataset card for the full extraction pipeline.
Hyperparameters
| Parameter | Value |
|---|---|
| LoRA rank | 16 |
| LoRA alpha | 32 |
| LoRA dropout | 0 |
| Target modules | q_proj, k_proj, v_proj, o_proj, gate_proj, up_proj, down_proj |
| Quantization | 4-bit NF4 (QLoRA) |
| Max sequence length | 512 |
| Optimizer | paged_adamw_8bit |
| Learning rate | 2e-4, cosine schedule |
Training infrastructure
Fine-tuned with Unsloth for 2x faster training and reduced VRAM, using TRL's SFTTrainer. Training run on a single NVIDIA A40 GPU. Experiment tracking via Weights & Biases.
Limitations and bias
- Training data reflects a specific snapshot in time; outputs may become outdated as source authorities issue updates.
- The model may occasionally produce a plausible-sounding but incorrect output for rare or highly compound cases — always have a qualified person verify before downstream use.
- English-language input only.
Related models in this suite
| Model | Task | Size |
|---|---|---|
| icd10-coder-qwen25-7b | ICD-10-CM medical coding | 7B |
| snomed-mapper-qwen25-7b | Clinical concept mapping | 7B |
| icd10-to-drg-qwen25-1b | ICD-10 → DRG reimbursement | 1.5B |
| pmjay-classifier-qwen25-3b | India PM-JAY classification | 3B |
Full collection: [link your HF collection here]
Citation
@misc{medicalai2026,
author = {Hebbar, Amaresh},
title = {Medical AI Fine-tuning Suite},
year = {2026},
publisher = {HuggingFace},
url = {https://huggingface.co/AmareshHebbar}
}
Contact
- GitHub: amareshhebbar
- LinkedIn: gvamaresh
- HuggingFace: AmareshHebbar
- Downloads last month
- -