phi4-mini-medical-qlora

A QLoRA fine-tuned version of microsoft/Phi-4-mini-reasoning on a curated medical question-answering dataset with chain-of-thought reasoning traces.

Model Details

Property	Value
Base Model	microsoft/Phi-4-mini-reasoning
Fine-tuning Method	QLoRA (4-bit NF4)
LoRA Rank	8
LoRA Alpha	16
Target Modules	q_proj, k_proj, v_proj, o_proj
Training Steps	300
Final Training Loss	1.3847
Loss Reduction	1.4551 → 1.1983 (−18%)
Training Time	~107 minutes
Sequence Length	1024
Hardware	NVIDIA Tesla T4 (15GB)
Framework	HuggingFace Transformers + PEFT + TRL

Training Data

Dataset: Ganesh01kumar02reddy/phi4-medical-preprocessed
Train samples used: 1,200 (subset of 5,518 total)
Val samples: 200
Format: User question → Chain-of-thought reasoning → Assistant answer
Label masking: Loss computed only on assistant answer tokens (85.6% of tokens)

Prompt Format

How to Use

from transformers import AutoTokenizer, AutoModelForCausalLM, BitsAndBytesConfig
from peft import PeftModel
import torch

BASE_MODEL = "microsoft/Phi-4-mini-reasoning"
ADAPTER    = "Ganesh01kumar02reddy/phi4-mini-medical-qlora"

bnb_config = BitsAndBytesConfig(
    load_in_4bit=True,
    bnb_4bit_quant_type="nf4",
    bnb_4bit_compute_dtype=torch.float16,
    bnb_4bit_use_double_quant=True,
)

tokenizer = AutoTokenizer.from_pretrained(BASE_MODEL)
base_model = AutoModelForCausalLM.from_pretrained(
    BASE_MODEL,
    quantization_config=bnb_config,
    device_map="auto",
    torch_dtype=torch.float16,
)

model = PeftModel.from_pretrained(base_model, ADAPTER)
model.eval()

question = "What are the causes of thrombocytopenia?"
reasoning = "Consider immune-mediated, drug-induced, and infectious causes."

prompt = (
    f"<|user|>\n{question}<|end|>\n"
    f"<|think|>\n{reasoning}<|end|>\n"
    f"<|assistant|>\n"
)

inputs = tokenizer(prompt, return_tensors="pt").to(model.device)

with torch.no_grad():
    output = model.generate(
        **inputs,
        max_new_tokens=256,
        do_sample=False,
        pad_token_id=tokenizer.eos_token_id,
    )

answer = tokenizer.decode(output[0][inputs["input_ids"].shape[-1]:],
                           skip_special_tokens=True)
print(answer)

Training Pipeline

The full preprocessing + training pipeline consists of:

Step	Description
Step 1	Raw data ingestion + deduplication
Step 2	Quality filtering
Step 3	Length filtering (max 4096 tokens)
Step 4	Train/Val/Test split (70/10/20)
Step 5	Prompt formatting + tokenization + label masking
Step 6	QLoRA fine-tuning on Phi-4-mini-reasoning

Limitations

Fine-tuned on a limited subset (1,200 samples) — may not generalise to all medical domains
Not intended for clinical use — outputs should be reviewed by a qualified medical professional
Validation loss reported as nan due to streaming dataset limitation (does not affect model weights)
Trained on T4 GPU with 4-bit quantisation — full precision inference may differ slightly

Hardware Requirements

Minimum 8GB VRAM for 4-bit inference
Recommended: T4 / A10G / RTX 3090 or better

Citation

@misc{phi4-mini-medical-qlora-2026,
  author       = {Ganesh01kumar02reddy},
  title        = {phi4-mini-medical-qlora},
  year         = {2026},
  publisher    = {HuggingFace},
  howpublished = {\url{https://huggingface.co/Ganesh01kumar02reddy/phi4-mini-medical-qlora}}
}

Disclaimer

This model is for research purposes only. Medical decisions should always involve qualified healthcare professionals. The model may produce incorrect, incomplete, or outdated medical information.

Downloads last month: 8

Model tree for Ganesh01kumar02reddy/phi4-mini-medical-qlora

Base model

microsoft/Phi-4-mini-reasoning

Adapter

(7)

this model

Ganesh01kumar02reddy
/

phi4-mini-medical-qlora