phi4-mini-medical-qlora

A QLoRA fine-tuned version of microsoft/Phi-4-mini-reasoning on a curated medical question-answering dataset with chain-of-thought reasoning traces.

Model Details

Property Value
Base Model microsoft/Phi-4-mini-reasoning
Fine-tuning Method QLoRA (4-bit NF4)
LoRA Rank 8
LoRA Alpha 16
Target Modules q_proj, k_proj, v_proj, o_proj
Training Steps 300
Final Training Loss 1.3847
Loss Reduction 1.4551 β†’ 1.1983 (βˆ’18%)
Training Time ~107 minutes
Sequence Length 1024
Hardware NVIDIA Tesla T4 (15GB)
Framework HuggingFace Transformers + PEFT + TRL

Training Data

  • Dataset: Ganesh01kumar02reddy/phi4-medical-preprocessed
  • Train samples used: 1,200 (subset of 5,518 total)
  • Val samples: 200
  • Format: User question β†’ Chain-of-thought reasoning β†’ Assistant answer
  • Label masking: Loss computed only on assistant answer tokens (85.6% of tokens)

Prompt Format

How to Use

from transformers import AutoTokenizer, AutoModelForCausalLM, BitsAndBytesConfig
from peft import PeftModel
import torch

BASE_MODEL = "microsoft/Phi-4-mini-reasoning"
ADAPTER    = "Ganesh01kumar02reddy/phi4-mini-medical-qlora"

bnb_config = BitsAndBytesConfig(
    load_in_4bit=True,
    bnb_4bit_quant_type="nf4",
    bnb_4bit_compute_dtype=torch.float16,
    bnb_4bit_use_double_quant=True,
)

tokenizer = AutoTokenizer.from_pretrained(BASE_MODEL)
base_model = AutoModelForCausalLM.from_pretrained(
    BASE_MODEL,
    quantization_config=bnb_config,
    device_map="auto",
    torch_dtype=torch.float16,
)

model = PeftModel.from_pretrained(base_model, ADAPTER)
model.eval()

question = "What are the causes of thrombocytopenia?"
reasoning = "Consider immune-mediated, drug-induced, and infectious causes."

prompt = (
    f"<|user|>\n{question}<|end|>\n"
    f"<|think|>\n{reasoning}<|end|>\n"
    f"<|assistant|>\n"
)

inputs = tokenizer(prompt, return_tensors="pt").to(model.device)

with torch.no_grad():
    output = model.generate(
        **inputs,
        max_new_tokens=256,
        do_sample=False,
        pad_token_id=tokenizer.eos_token_id,
    )

answer = tokenizer.decode(output[0][inputs["input_ids"].shape[-1]:],
                           skip_special_tokens=True)
print(answer)

Training Pipeline

The full preprocessing + training pipeline consists of:

Step Description
Step 1 Raw data ingestion + deduplication
Step 2 Quality filtering
Step 3 Length filtering (max 4096 tokens)
Step 4 Train/Val/Test split (70/10/20)
Step 5 Prompt formatting + tokenization + label masking
Step 6 QLoRA fine-tuning on Phi-4-mini-reasoning

Limitations

  • Fine-tuned on a limited subset (1,200 samples) β€” may not generalise to all medical domains
  • Not intended for clinical use β€” outputs should be reviewed by a qualified medical professional
  • Validation loss reported as nan due to streaming dataset limitation (does not affect model weights)
  • Trained on T4 GPU with 4-bit quantisation β€” full precision inference may differ slightly

Hardware Requirements

  • Minimum 8GB VRAM for 4-bit inference
  • Recommended: T4 / A10G / RTX 3090 or better

Citation

@misc{phi4-mini-medical-qlora-2026,
  author       = {Ganesh01kumar02reddy},
  title        = {phi4-mini-medical-qlora},
  year         = {2026},
  publisher    = {HuggingFace},
  howpublished = {\url{https://huggingface.co/Ganesh01kumar02reddy/phi4-mini-medical-qlora}}
}

Disclaimer

This model is for research purposes only. Medical decisions should always involve qualified healthcare professionals. The model may produce incorrect, incomplete, or outdated medical information.

Downloads last month
8
Inference Providers NEW
This model isn't deployed by any Inference Provider. πŸ™‹ Ask for provider support

Model tree for Ganesh01kumar02reddy/phi4-mini-medical-qlora

Adapter
(7)
this model

Dataset used to train Ganesh01kumar02reddy/phi4-mini-medical-qlora