SAC + HER on FetchPickAndPlace-v4

A Soft Actor-Critic (SAC) agent with Hindsight Experience Replay (HER), trained from scratch to solve the MuJoCo FetchPickAndPlace task: reach a block, grasp it, and place it at a target location from sparse reward.

Results

  • Evaluation success rate: 100% (deterministic, 30+ episodes)
  • Mean episode reward: ~-9.7 (sparse reward; lower magnitude = faster placement)
  • Trained for 1.5M timesteps (~10.5 h on CPU)

Usage

import gymnasium as gym, gymnasium_robotics
from stable_baselines3 import SAC
from stable_baselines3.common.buffers import DictReplayBuffer
from huggingface_hub import hf_hub_download

gym.register_envs(gymnasium_robotics)
path = hf_hub_download("hhmm1122/fetch-pickandplace-sac-her", "best_model.zip")
env = gym.make("FetchPickAndPlace-v4", max_episode_steps=50)
model = SAC.load(path, env=env, custom_objects={
    "replay_buffer_class": DictReplayBuffer, "replay_buffer_kwargs": {}, "buffer_size": 1})

Training

  • Algorithm: SAC + HER (n_sampled_goal=4, goal_selection_strategy="future")
  • Network: MLP [512, 512, 512], batch 512, lr 1e-3, gamma 0.95
  • Framework: Stable-Baselines3 2.8.0, Gymnasium-Robotics 1.4.2
Downloads last month
41
Video Preview
loading

Evaluation results