Fine-tuned Gemma 3 4B for Medical QA & Summarization (drwlf/gemma-3
)
This repository contains LoRA adapters for the unsloth/gemma-3-4b-it
model, fine-tuned on a diverse collection of medical text datasets using Unsloth and QLoRA.
NOTE: This model is fine-tuned on text data only. It does not possess the multimodal image understanding capabilities of the base Gemma 3 model unless further fine-tuned on image-text data.
Model Description
- Base Model:
unsloth/gemma-3-4b-it
(Google's Gemma 3 4B instruction-tuned model, optimized by Unsloth). - Fine-tuning Method: QLoRA (4-bit NormalFloat) via the Unsloth library (LoRA r=8, alpha=16).
- Goal: To enhance the base model's ability to understand and respond to medical queries, summarize medical text, and provide information relevant to the domains covered in the fine-tuning datasets.
Intended Uses & Limitations
Intended Use
This model is intended as an informational assistant for healthcare professionals, researchers, and students. Potential applications include:
- Answering questions based on medical knowledge derived from PubMed, MedQuAD, and dermatology FAQs.
- Summarizing medical abstracts or articles similar to those in the PubMed Summarization dataset.
- Assisting with information retrieval related to dermatology queries.
- Serving as a foundation for further fine-tuning on more specialized medical tasks or datasets (including potentially multimodal data, leveraging the base Gemma 3 architecture).
Limitations and Bias
- 🚨 Not a Medical Device: This model is NOT a substitute for professional medical advice, diagnosis, or treatment. It should NEVER be used for clinical decision-making.
- Potential Inaccuracies: Like all LLMs, this model can generate incorrect information (hallucinate) or produce outputs that seem plausible but are factually wrong. Always verify critical information with reliable medical sources and expert consultation.
- Training Data Bias: The model's knowledge and potential biases are derived from the underlying base model (Gemma 3) and the specific fine-tuning datasets. These datasets may contain inherent biases (e.g., demographic, geographic) which could be reflected in the model's outputs.
- Limited Scope: The fine-tuning data focused on specific sources (PubMed QA/Summarization, Dermatology QA, MedQuAD). The model's expertise will be strongest in these areas and limited in others (e.g., minimal specific knowledge of plastic surgery or aesthetics was included in this fine-tuning round).
- No Formal Evaluation: Performance has not been rigorously evaluated on standard medical benchmarks. The reported training loss (> 1.3 after 2 epochs) indicates learning occurred but likely not full convergence on the training data.
How to Use
These are LoRA adapters. You need to load them onto the base model (unsloth/gemma-3-4b-it
) using Unsloth and PEFT.
from unsloth import FastModel
import torch
# Make sure you are logged in to Hugging Face if needed for the base model
# from huggingface_hub import login
# login()
model_name = "unsloth/gemma-3-4b-it"
adapter_repo_id = "drwlf/gemma-3" # This model card's repo
max_seq_len = 4096 # Or your preferred length
# Load the base model in 4-bit
model, base_tokenizer = FastModel.from_pretrained(
model_name=model_name,
max_seq_length=max_seq_len,
load_in_4bit=True,
# token="hf_...", # Add if needed
# dtype=torch.bfloat16, # Optional
)
# Load the adapters
print(f"Loading adapters from {adapter_repo_id}...")
model.load_adapter(adapter_repo_id) # No token needed if repo is public
print("Adapters loaded.")
# Ensure tokenizer uses the correct chat template
from unsloth.chat_templates import get_chat_template
tokenizer = get_chat_template(
base_tokenizer,
chat_template = "gemma-3", # Use the template the model was trained with
)
# --- Inference Example ---
messages = [
{"role": "user", "content": "What are common treatments for plaque psoriasis?"}
]
# Apply the chat template
prompt = tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
# Tokenize the prompt
inputs = tokenizer([prompt], return_tensors="pt").to("cuda")
# Generate response
from transformers import TextStreamer
streamer = TextStreamer(tokenizer, skip_prompt=True)
print("\nGenerating response...")
_ = model.generate(
**inputs,
max_new_tokens=256,
use_cache=True,
temperature=0.7, # Adjust temperature as needed
top_p=0.95,
top_k=64,
eos_token_id=tokenizer.eos_token_id,
pad_token_id=tokenizer.eos_token_id, # Use EOS for padding
streamer=streamer,
)
---
base_model: unsloth/gemma-3-12b-it-unsloth-bnb-4bit
tags:
- text-generation-inference
- transformers
- unsloth
- gemma3
license: apache-2.0
language:
- en
---
# Uploaded finetuned model
- **Developed by:** drwlf
- **License:** apache-2.0
- **Finetuned from model :** unsloth/gemma-3-12b-it-unsloth-bnb-4bit
This gemma3 model was trained 2x faster with [Unsloth](https://github.com/unslothai/unsloth) and Huggingface's TRL library.
[<img src="https://raw.githubusercontent.com/unslothai/unsloth/main/images/unsloth%20made%20with%20love.png" width="200"/>](https://github.com/unslothai/unsloth)
- Downloads last month
- 11