SAC + HER on FetchReach-v4

A Soft Actor-Critic (SAC) agent with Hindsight Experience Replay (HER), trained to solve the MuJoCo FetchReach task: move the gripper to a target position from sparse reward. Used here as a fast pipeline-validation baseline before the harder PickAndPlace task.

Results

Evaluation success rate: 100% (deterministic, 20+ episodes)
Mean episode reward: ~-1.6 (sparse reward)
Trained for 25k timesteps (~4 min on CPU)

Usage

import gymnasium as gym, gymnasium_robotics
from stable_baselines3 import SAC
from stable_baselines3.common.buffers import DictReplayBuffer
from huggingface_hub import hf_hub_download

gym.register_envs(gymnasium_robotics)
path = hf_hub_download("hhmm1122/fetch-reach-sac-her", "best_model.zip")
env = gym.make("FetchReach-v4", max_episode_steps=50)
model = SAC.load(path, env=env, custom_objects={
    "replay_buffer_class": DictReplayBuffer, "replay_buffer_kwargs": {}, "buffer_size": 1})

Training

Algorithm: SAC + HER (n_sampled_goal=4, goal_selection_strategy="future")
Network: MLP [256, 256, 256], batch 256, lr 1e-3, gamma 0.95
Framework: Stable-Baselines3 2.8.0, Gymnasium-Robotics 1.4.2

Full code: https://github.com/IAmHassanMehmood/rl-fetch-manipulation1~--- library_name: stable-baselines3 tags: - reinforcement-learning - robotics - mujoco - gymnasium-robotics - sac - hindsight-experience-replay model-index: - name: SAC-HER-FetchReach results: - task: type: reinforcement-learning dataset: name: FetchReach-v4 type: FetchReach-v4 metrics: - type: success_rate value: "100%"

SAC + HER on FetchReach-v4

Results

Evaluation success rate: 100% (deterministic, 20+ episodes)
Mean episode reward: ~-1.6 (sparse reward)
Trained for 25k timesteps (~4 min on CPU)

Usage

import gymnasium as gym, gymnasium_robotics
from stable_baselines3 import SAC
from stable_baselines3.common.buffers import DictReplayBuffer
from huggingface_hub import hf_hub_download

gym.register_envs(gymnasium_robotics)
path = hf_hub_download("hhmm1122/fetch-reach-sac-her", "best_model.zip")
env = gym.make("FetchReach-v4", max_episode_steps=50)
model = SAC.load(path, env=env, custom_objects={
    "replay_buffer_class": DictReplayBuffer, "replay_buffer_kwargs": {}, "buffer_size": 1})

Training

Algorithm: SAC + HER (n_sampled_goal=4, goal_selection_strategy="future")
Network: MLP [256, 256, 256], batch 256, lr 1e-3, gamma 0.95
Framework: Stable-Baselines3 2.8.0, Gymnasium-Robotics 1.4.2

Full code: https://github.com/IAmHassanMehmood/rl-fetch-manipulation

Downloads last month: 45

Video Preview

Reinforcement Learning

Evaluation results

success_rate on FetchReach-v4
self-reported

100%