YAML Metadata Warning:empty or missing yaml metadata in repo card

Check out the documentation for more information.

OrbitalThrusterEnv β€” GRPO LoRA adapter

Base model: Qwen/Qwen2.5-7B-Instruct

Source env: https://huggingface.co/spaces/pixxel-phantom/orbital-thruster-env

Trained via TRL GRPOTrainer + Unsloth on OpenEnv OrbitalThrusterEnv flagship task mission_ops_long_horizon. 5 independent reward funcs (format, env-step, mode-match, anti-spam, fuel-discipline) for anti-reward-hacking.

Artifacts

  • trainer_output/qwen_grpo/ β€” final LoRA adapter
  • trainer_output/qwen_sft/ β€” SFT warm-start adapter
  • outputs/training/grpo_metrics.png β€” reward + loss curves
  • outputs/eval_trained/trained_vs_baseline.png β€” trained vs baselines on 4 tasks
Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model isn't deployed by any Inference Provider. πŸ™‹ Ask for provider support

Space using pixxel-phantom/orbital-thruster-grpo 1