SAC HalfCheetah-v5

Stable Baselines3 SAC policy trained on Gymnasium HalfCheetah-v5.

The environment includes a small anti-flip reward guard to discourage belly-slide exploit postures.

Videos

Before training

After training

What Was Done

Trained SAC with MlpPolicy for 300000 timesteps.
Recorded a random policy rollout before training.
Recorded the trained policy rollout after training.
Saved the trained checkpoint as sac_half_cheetah.zip.

Evaluation

Single deterministic rollout with seed 8.

Metric	Value
Steps	1000
Return	7031.927
Mean reward	7.032
Mean x velocity	7.465
Final x position	373.190
Minimum torso height	0.534
Maximum absolute root angle	0.269
Fell	false

Load

from stable_baselines3 import SAC

model = SAC.load("sac_half_cheetah.zip")

Use the included wrapper for matching evaluation.

from sac_cheetah.config import TrainConfig
from sac_cheetah.envs import make_env

cfg = TrainConfig()
env = make_env(cfg.env_id, cfg.seed + 1, render_mode="rgb_array")

Downloads last month: -

Video Preview

Reinforcement Learning