YAML Metadata Warning:empty or missing yaml metadata in repo card
Check out the documentation for more information.
OrbitalThrusterEnv β GRPO LoRA adapter
Base model: Qwen/Qwen2.5-7B-Instruct
Source env: https://huggingface.co/spaces/pixxel-phantom/orbital-thruster-env
Trained via TRL GRPOTrainer + Unsloth on OpenEnv OrbitalThrusterEnv flagship task mission_ops_long_horizon.
5 independent reward funcs (format, env-step, mode-match, anti-spam, fuel-discipline) for anti-reward-hacking.
Artifacts
trainer_output/qwen_grpo/β final LoRA adaptertrainer_output/qwen_sft/β SFT warm-start adapteroutputs/training/grpo_metrics.pngβ reward + loss curvesoutputs/eval_trained/trained_vs_baseline.pngβ trained vs baselines on 4 tasks
Inference Providers NEW
This model isn't deployed by any Inference Provider. π Ask for provider support