ICD-10 Coder — Qwen2.5-7B

Fine-tuned medical LLM for icd-10 coder, built by Amaresh Hebbar.

Given a clinical description, this model returns the most accurate ICD-10-CM code with a brief justification. It's a structured-output translation layer between free-text clinical language and the coding system insurance reimbursement runs on.

Model summary

Base model unsloth/Qwen2.5-7B-Instruct
Fine-tuning method QLoRA (4-bit NF4), rank 16
Training framework Unsloth + TRL SFTTrainer
Training data AmareshHebbar/icd10-coder-sft — 74,719 rows
Hardware NVIDIA A40 (48GB), single GPU
License Apache 2.0

Intended use

Drop-in coding assistant for clinical documentation pipelines, EHR systems, and medical billing software. Given a diagnosis description, the model outputs the matching ICD-10-CM code — nothing else needed downstream except a DRG lookup table if you need reimbursement estimates.

This model is not a substitute for a certified medical professional's judgment. Output should be reviewed by a qualified person before being used in a clinical or billing decision. The model can make mistakes, especially on rare or compound cases.

How to use

With 🤗 Transformers + PEFT

from transformers import AutoModelForCausalLM, AutoTokenizer
from peft import PeftModel
import torch

base_model = "unsloth/Qwen2.5-7B-Instruct"
adapter    = "AmareshHebbar/icd10-coder-qwen25-7b"

tokenizer = AutoTokenizer.from_pretrained(base_model)
model = AutoModelForCausalLM.from_pretrained(
    base_model,
    torch_dtype=torch.bfloat16,
    device_map="auto",
)
model = PeftModel.from_pretrained(model, adapter)

messages = [
    {"role": "system", "content": "You are a certified medical coder. Given a clinical description or diagnosis, return the most accurate ICD-10-CM code with a brief justification."},
    {"role": "user", "content": "Patient presents with uncontrolled type 2 diabetes with diabetic nephropathy."},
]
inputs = tokenizer.apply_chat_template(messages, return_tensors="pt", add_generation_prompt=True).to(model.device)
outputs = model.generate(inputs, max_new_tokens=128, temperature=0.1, do_sample=True)
print(tokenizer.decode(outputs[0][inputs.shape[1]:], skip_special_tokens=True))

Expected output:

E11.65 — Type 2 diabetes mellitus with hyperglycemia, combined with N08 for diabetic nephropathy. Primary code: E11.65, Secondary: N08.

With Unsloth (faster inference, recommended)

from unsloth import FastLanguageModel

model, tokenizer = FastLanguageModel.from_pretrained(
    model_name="AmareshHebbar/icd10-coder-qwen25-7b",
    max_seq_length=512,
    load_in_4bit=True,
)
FastLanguageModel.for_inference(model)

messages = [
    {"role": "system", "content": "You are a certified medical coder. Given a clinical description or diagnosis, return the most accurate ICD-10-CM code with a brief justification."},
    {"role": "user", "content": "Acute ST-elevation myocardial infarction of the anterior wall, initial encounter."},
]
prompt = tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
inputs = tokenizer(prompt, return_tensors="pt").to("cuda")
outputs = model.generate(**inputs, max_new_tokens=128, temperature=0.1, do_sample=True)
print(tokenizer.decode(outputs[0][inputs["input_ids"].shape[1]:], skip_special_tokens=True))

With vLLM (production serving)

vllm serve unsloth/Qwen2.5-7B-Instruct \
    --enable-lora \
    --lora-modules icd10-coder-qwen25-7b=AmareshHebbar/icd10-coder-qwen25-7b \
    --host 0.0.0.0 --port 8000 --dtype bfloat16
from openai import OpenAI

client = OpenAI(base_url="http://localhost:8000/v1", api_key="not-needed")
response = client.chat.completions.create(
    model="icd10-coder-qwen25-7b",
    messages=[
        {"role": "system", "content": "You are a certified medical coder. Given a clinical description or diagnosis, return the most accurate ICD-10-CM code with a brief justification."},
        {"role": "user", "content": "Community-acquired pneumonia due to Streptococcus pneumoniae."},
    ],
    temperature=0.1,
)
print(response.choices[0].message.content)

Training details

Data

Trained on 74,719 examples extracted from real CMS FY2026 ICD-10-CM Tabular Order codes. No synthetic or LLM-generated training data — every example pairs real-world input with its authoritative output.

  • Train: 59,775 examples
  • Validation: 7,472 examples
  • Test: 7,472 examples

See the dataset card for the full extraction pipeline.

Hyperparameters

Parameter Value
LoRA rank 16
LoRA alpha 32
LoRA dropout 0
Target modules q_proj, k_proj, v_proj, o_proj, gate_proj, up_proj, down_proj
Quantization 4-bit NF4 (QLoRA)
Max sequence length 512
Optimizer paged_adamw_8bit
Learning rate 2e-4, cosine schedule

Training infrastructure

Fine-tuned with Unsloth for 2x faster training and reduced VRAM, using TRL's SFTTrainer. Training run on a single NVIDIA A40 GPU. Experiment tracking via Weights & Biases.

Limitations and bias

  • Training data reflects a specific snapshot in time; outputs may become outdated as source authorities issue updates.
  • The model may occasionally produce a plausible-sounding but incorrect output for rare or highly compound cases — always have a qualified person verify before downstream use.
  • English-language input only.

Related models in this suite

Model Task Size
icd10-coder-qwen25-7b ICD-10-CM medical coding 7B
snomed-mapper-qwen25-7b Clinical concept mapping 7B
icd10-to-drg-qwen25-1b ICD-10 → DRG reimbursement 1.5B
pmjay-classifier-qwen25-3b India PM-JAY classification 3B

Full collection: [link your HF collection here]

Citation

@misc{medicalai2026,
  author    = {Hebbar, Amaresh},
  title     = {Medical AI Fine-tuning Suite},
  year      = {2026},
  publisher = {HuggingFace},
  url       = {https://huggingface.co/AmareshHebbar}
}

Contact

Downloads last month
-
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for AmareshHebbar/icd10-coder-qwen25-7b

Base model

Qwen/Qwen2.5-7B
Adapter
(653)
this model

Dataset used to train AmareshHebbar/icd10-coder-qwen25-7b

Collection including AmareshHebbar/icd10-coder-qwen25-7b