std-grpo-with-ref

Overview

This repository contains a medical LLM-as-a-Judge model based on Gemma-3-4B and fine-tuned for medical response evaluation tasks.

Standard GRPO-trained medical judge model using expert reference answers.

The model is designed to evaluate generated medical answers according to predefined clinical evaluation criteria.

Depending on the version, the model may operate:

Safetensors

Model size

4B params

Tensor type

F32