std-grpo-with-ref
Overview
This repository contains a medical LLM-as-a-Judge model based on Gemma-3-4B and fine-tuned for medical response evaluation tasks.
Standard GRPO-trained medical judge model using expert reference answers.
Base Model
- Base architecture: Gemma-3-4B
- Frameworks:
- Transformers
- PEFT / LoRA
- TRL
Training Objective
The model is designed to evaluate generated medical answers according to predefined clinical evaluation criteria.
Depending on the version, the model may operate:
- with expert reference answers (
with-ref) - without reference answers (
no-ref)
- Downloads last month
- 38