Transformers
Safetensors
Generated from Trainer
trl
grpo
hf_jobs
clinical-recruitment
openenv
long-horizon
lora
Instructions to use pratimassaravanan/grpo with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use pratimassaravanan/grpo with Transformers:
# Load model directly from transformers import AutoModel model = AutoModel.from_pretrained("pratimassaravanan/grpo", dtype="auto") - Notebooks
- Google Colab
- Kaggle
Clinical Recruitment GRPO Agent (Best Run โ 80 Steps)
This is the best trained LoRA adapter for the Adaptive Clinical Recruitment Environment, produced by SFT warmup followed by 80-step GRPO on Qwen3-1.7B (NVIDIA L4, 24GB).
Hackathon positioning: Theme #2 (Super Long-Horizon Planning).
Training Summary
| Metric | Value |
|---|---|
| Base model | Qwen/Qwen3-1.7B |
| Method | SFT warmup + 80-step GRPO |
| GPU | NVIDIA L4 (24GB) via HF Jobs |
| Duration | 142 min total (3.5 min SFT + 141.7 min GRPO) |
| Reward (start) | 0.269 |
| Reward (end) | 0.331 |
| Tool calls/step (start) | 3.5 |
| Tool calls/step (end) | 11 |
| Enrollment/rollout | 3-4 patients |
| Zero-std collapse rate | 10% (8/80 steps, intermittent) |
| LoRA rank | 16 |
| LoRA alpha | 16 |
| LoRA dropout | 0.05 |
What the Model Learned
- Calls
screen_patientas first action (previously collapsed toadjust_strategy) - Follows the correct pipeline: screen -> recontact -> enrollment
- Enrolls 3-4 patients per episode on easy_bench
- Makes 11+ tool calls per rollout (up from 3.5)
- Recovers from intermittent collapse without sustained degradation
Training Plots
Honest Limits
- Enrollment stays at 3-4/80 target patients per episode
- Zero-std collapse still occurs on ~10% of steps
- Reward plateaued at 0.33 after 80 steps
Links
- Live environment: pratimassaravanan-clinical-recruitment.hf.space
- HF Space: pratimassaravanan/clinical-recruitment
- Training script:
train_sft_grpo_hfjob.pyin the Space repo - 30-step first run: pratimassaravanan/grpo_output
- SFT+REINFORCE model: pratimassaravanan/clinical-qwen3-4b-sft-lora
- Artifacts: pratimassaravanan/clinical-recruitment-artifacts
Framework Versions
- TRL: 1.2.0
- Transformers: 5.6.2
- PyTorch: 2.11.0
- PEFT: 0.19.1
Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐ Ask for provider support

