Clinical Recruitment GRPO Agent (Best Run — 80 Steps)

This is the best trained LoRA adapter for the Adaptive Clinical Recruitment Environment, produced by SFT warmup followed by 80-step GRPO on Qwen3-1.7B (NVIDIA L4, 24GB).

Hackathon positioning: Theme #2 (Super Long-Horizon Planning).

Training Summary

Metric	Value
Base model	Qwen/Qwen3-1.7B
Method	SFT warmup + 80-step GRPO
GPU	NVIDIA L4 (24GB) via HF Jobs
Duration	142 min total (3.5 min SFT + 141.7 min GRPO)
Reward (start)	0.269
Reward (end)	0.331
Tool calls/step (start)	3.5
Tool calls/step (end)	11
Enrollment/rollout	3-4 patients
Zero-std collapse rate	10% (8/80 steps, intermittent)
LoRA rank	16
LoRA alpha	16
LoRA dropout	0.05

What the Model Learned

Calls screen_patient as first action (previously collapsed to adjust_strategy)
Follows the correct pipeline: screen -> recontact -> enrollment
Enrolls 3-4 patients per episode on easy_bench
Makes 11+ tool calls per rollout (up from 3.5)
Recovers from intermittent collapse without sustained degradation

Training Plots

Honest Limits

Enrollment stays at 3-4/80 target patients per episode
Zero-std collapse still occurs on ~10% of steps
Reward plateaued at 0.33 after 80 steps

Framework Versions

TRL: 1.2.0
Transformers: 5.6.2
PyTorch: 2.11.0
PEFT: 0.19.1

Downloads last month: -; Downloads are not tracked for this model. How to track

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for pratimassaravanan/grpo

Base model

Qwen/Qwen3-1.7B-Base

Finetuned

Qwen/Qwen3-1.7B

Adapter

(510)

this model

pratimassaravanan
/

grpo