Ganesh01kumar02reddy/phi4-medical-preprocessed
Viewer β’ Updated β’ 7.88k β’ 91
How to use Ganesh01kumar02reddy/phi4-mini-medical-qlora with PEFT:
from peft import PeftModel
from transformers import AutoModelForCausalLM
base_model = AutoModelForCausalLM.from_pretrained("microsoft/Phi-4-mini-reasoning")
model = PeftModel.from_pretrained(base_model, "Ganesh01kumar02reddy/phi4-mini-medical-qlora")A QLoRA fine-tuned version of microsoft/Phi-4-mini-reasoning on a curated medical question-answering dataset with chain-of-thought reasoning traces.
| Property | Value |
|---|---|
| Base Model | microsoft/Phi-4-mini-reasoning |
| Fine-tuning Method | QLoRA (4-bit NF4) |
| LoRA Rank | 8 |
| LoRA Alpha | 16 |
| Target Modules | q_proj, k_proj, v_proj, o_proj |
| Training Steps | 300 |
| Final Training Loss | 1.3847 |
| Loss Reduction | 1.4551 β 1.1983 (β18%) |
| Training Time | ~107 minutes |
| Sequence Length | 1024 |
| Hardware | NVIDIA Tesla T4 (15GB) |
| Framework | HuggingFace Transformers + PEFT + TRL |
from transformers import AutoTokenizer, AutoModelForCausalLM, BitsAndBytesConfig
from peft import PeftModel
import torch
BASE_MODEL = "microsoft/Phi-4-mini-reasoning"
ADAPTER = "Ganesh01kumar02reddy/phi4-mini-medical-qlora"
bnb_config = BitsAndBytesConfig(
load_in_4bit=True,
bnb_4bit_quant_type="nf4",
bnb_4bit_compute_dtype=torch.float16,
bnb_4bit_use_double_quant=True,
)
tokenizer = AutoTokenizer.from_pretrained(BASE_MODEL)
base_model = AutoModelForCausalLM.from_pretrained(
BASE_MODEL,
quantization_config=bnb_config,
device_map="auto",
torch_dtype=torch.float16,
)
model = PeftModel.from_pretrained(base_model, ADAPTER)
model.eval()
question = "What are the causes of thrombocytopenia?"
reasoning = "Consider immune-mediated, drug-induced, and infectious causes."
prompt = (
f"<|user|>\n{question}<|end|>\n"
f"<|think|>\n{reasoning}<|end|>\n"
f"<|assistant|>\n"
)
inputs = tokenizer(prompt, return_tensors="pt").to(model.device)
with torch.no_grad():
output = model.generate(
**inputs,
max_new_tokens=256,
do_sample=False,
pad_token_id=tokenizer.eos_token_id,
)
answer = tokenizer.decode(output[0][inputs["input_ids"].shape[-1]:],
skip_special_tokens=True)
print(answer)
The full preprocessing + training pipeline consists of:
| Step | Description |
|---|---|
| Step 1 | Raw data ingestion + deduplication |
| Step 2 | Quality filtering |
| Step 3 | Length filtering (max 4096 tokens) |
| Step 4 | Train/Val/Test split (70/10/20) |
| Step 5 | Prompt formatting + tokenization + label masking |
| Step 6 | QLoRA fine-tuning on Phi-4-mini-reasoning |
nan due to streaming dataset limitation (does not affect model weights)@misc{phi4-mini-medical-qlora-2026,
author = {Ganesh01kumar02reddy},
title = {phi4-mini-medical-qlora},
year = {2026},
publisher = {HuggingFace},
howpublished = {\url{https://huggingface.co/Ganesh01kumar02reddy/phi4-mini-medical-qlora}}
}
This model is for research purposes only. Medical decisions should always involve qualified healthcare professionals. The model may produce incorrect, incomplete, or outdated medical information.
Base model
microsoft/Phi-4-mini-reasoning