LLM-Zero-Lite Experiments

A controlled comparison of continuous GRPO, fixed staged GRPO, and an LLM-controlled staged GRPO schedule on three-number Countdown using Qwen/Qwen3-1.7B with LoRA.

Final 1,000-step results

Method Greedy accuracy Sampled pass@1 Sampled pass@4
Continuous GRPO 26.5% 31.0% 35.5%
Fixed staged GRPO 34.5% 34.5% 39.5%
LLM controller 36.5% 37.5% 40.5%

The runs/ directory contains metrics, evaluation samples, configuration history, controller decisions, logs, plots, and all saved LoRA checkpoints.

Downloads last month
-
Video Preview
loading

Model tree for kishan51/llm-zero-lite-experiments

Finetuned
Qwen/Qwen3-1.7B
Adapter
(537)
this model