ICD-10 Coder — Qwen2.5-7B

Fine-tuned medical LLM for icd-10 coder, built by Amaresh Hebbar.

Given a clinical description, this model returns the most accurate ICD-10-CM code with a brief justification. It's a structured-output translation layer between free-text clinical language and the coding system insurance reimbursement runs on.

Model summary


Base model	unsloth/Qwen2.5-7B-Instruct
Fine-tuning method	QLoRA (4-bit NF4), rank 16
Training framework	Unsloth + TRL SFTTrainer
Training data	AmareshHebbar/icd10-coder-sft — 74,719 rows
Hardware	NVIDIA A40 (48GB), single GPU
License	Apache 2.0

Intended use

Drop-in coding assistant for clinical documentation pipelines, EHR systems, and medical billing software. Given a diagnosis description, the model outputs the matching ICD-10-CM code — nothing else needed downstream except a DRG lookup table if you need reimbursement estimates.

This model is not a substitute for a certified medical professional's judgment. Output should be reviewed by a qualified person before being used in a clinical or billing decision. The model can make mistakes, especially on rare or compound cases.

How to use

With 🤗 Transformers + PEFT

from transformers import AutoModelForCausalLM, AutoTokenizer
from peft import PeftModel
import torch

base_model = "unsloth/Qwen2.5-7B-Instruct"
adapter    = "AmareshHebbar/icd10-coder-qwen25-7b"

tokenizer = AutoTokenizer.from_pretrained(base_model)
model = AutoModelForCausalLM.from_pretrained(
    base_model,
    torch_dtype=torch.bfloat16,
    device_map="auto",
)
model = PeftModel.from_pretrained(model, adapter)

messages = [
    {"role": "system", "content": "You are a certified medical coder. Given a clinical description or diagnosis, return the most accurate ICD-10-CM code with a brief justification."},
    {"role": "user", "content": "Patient presents with uncontrolled type 2 diabetes with diabetic nephropathy."},
]
inputs = tokenizer.apply_chat_template(messages, return_tensors="pt", add_generation_prompt=True).to(model.device)
outputs = model.generate(inputs, max_new_tokens=128, temperature=0.1, do_sample=True)
print(tokenizer.decode(outputs[0][inputs.shape[1]:], skip_special_tokens=True))

Expected output:

E11.65 — Type 2 diabetes mellitus with hyperglycemia, combined with N08 for diabetic nephropathy. Primary code: E11.65, Secondary: N08.

With Unsloth (faster inference, recommended)

from unsloth import FastLanguageModel

model, tokenizer = FastLanguageModel.from_pretrained(
    model_name="AmareshHebbar/icd10-coder-qwen25-7b",
    max_seq_length=512,
    load_in_4bit=True,
)
FastLanguageModel.for_inference(model)

messages = [
    {"role": "system", "content": "You are a certified medical coder. Given a clinical description or diagnosis, return the most accurate ICD-10-CM code with a brief justification."},
    {"role": "user", "content": "Acute ST-elevation myocardial infarction of the anterior wall, initial encounter."},
]
prompt = tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
inputs = tokenizer(prompt, return_tensors="pt").to("cuda")
outputs = model.generate(**inputs, max_new_tokens=128, temperature=0.1, do_sample=True)
print(tokenizer.decode(outputs[0][inputs["input_ids"].shape[1]:], skip_special_tokens=True))

With vLLM (production serving)

vllm serve unsloth/Qwen2.5-7B-Instruct \
    --enable-lora \
    --lora-modules icd10-coder-qwen25-7b=AmareshHebbar/icd10-coder-qwen25-7b \
    --host 0.0.0.0 --port 8000 --dtype bfloat16

from openai import OpenAI

client = OpenAI(base_url="http://localhost:8000/v1", api_key="not-needed")
response = client.chat.completions.create(
    model="icd10-coder-qwen25-7b",
    messages=[
        {"role": "system", "content": "You are a certified medical coder. Given a clinical description or diagnosis, return the most accurate ICD-10-CM code with a brief justification."},
        {"role": "user", "content": "Community-acquired pneumonia due to Streptococcus pneumoniae."},
    ],
    temperature=0.1,
)
print(response.choices[0].message.content)

Training details

Data

Trained on 74,719 examples extracted from real CMS FY2026 ICD-10-CM Tabular Order codes. No synthetic or LLM-generated training data — every example pairs real-world input with its authoritative output.

Train: 59,775 examples
Validation: 7,472 examples
Test: 7,472 examples

See the dataset card for the full extraction pipeline.

Hyperparameters

Parameter	Value
LoRA rank	16
LoRA alpha	32
LoRA dropout	0
Target modules	q_proj, k_proj, v_proj, o_proj, gate_proj, up_proj, down_proj
Quantization	4-bit NF4 (QLoRA)
Max sequence length	512
Optimizer	paged_adamw_8bit
Learning rate	2e-4, cosine schedule

Training infrastructure

Fine-tuned with Unsloth for 2x faster training and reduced VRAM, using TRL's SFTTrainer. Training run on a single NVIDIA A40 GPU. Experiment tracking via Weights & Biases.

Limitations and bias

Training data reflects a specific snapshot in time; outputs may become outdated as source authorities issue updates.
The model may occasionally produce a plausible-sounding but incorrect output for rare or highly compound cases — always have a qualified person verify before downstream use.
English-language input only.

Related models in this suite

Model	Task	Size
icd10-coder-qwen25-7b	ICD-10-CM medical coding	7B
snomed-mapper-qwen25-7b	Clinical concept mapping	7B
icd10-to-drg-qwen25-1b	ICD-10 → DRG reimbursement	1.5B
pmjay-classifier-qwen25-3b	India PM-JAY classification	3B

Full collection: [link your HF collection here]

Citation

@misc{medicalai2026,
  author    = {Hebbar, Amaresh},
  title     = {Medical AI Fine-tuning Suite},
  year      = {2026},
  publisher = {HuggingFace},
  url       = {https://huggingface.co/AmareshHebbar}
}