SAC + HER on FetchPickAndPlace-v4

A Soft Actor-Critic (SAC) agent with Hindsight Experience Replay (HER), trained from scratch to solve the MuJoCo FetchPickAndPlace task: reach a block, grasp it, and place it at a target location from sparse reward.

Results

Evaluation success rate: 100% (deterministic, 30+ episodes)
Mean episode reward: ~-9.7 (sparse reward; lower magnitude = faster placement)
Trained for 1.5M timesteps (~10.5 h on CPU)

Usage

import gymnasium as gym, gymnasium_robotics
from stable_baselines3 import SAC
from stable_baselines3.common.buffers import DictReplayBuffer
from huggingface_hub import hf_hub_download

gym.register_envs(gymnasium_robotics)
path = hf_hub_download("hhmm1122/fetch-pickandplace-sac-her", "best_model.zip")
env = gym.make("FetchPickAndPlace-v4", max_episode_steps=50)
model = SAC.load(path, env=env, custom_objects={
    "replay_buffer_class": DictReplayBuffer, "replay_buffer_kwargs": {}, "buffer_size": 1})

Training

Algorithm: SAC + HER (n_sampled_goal=4, goal_selection_strategy="future")
Network: MLP [512, 512, 512], batch 512, lr 1e-3, gamma 0.95
Framework: Stable-Baselines3 2.8.0, Gymnasium-Robotics 1.4.2

Downloads last month: 41

Video Preview

Reinforcement Learning

Evaluation results

success_rate on FetchPickAndPlace-v4
self-reported

100%