--- library_name: stable-baselines3 tags: - PandaPickAndPlace-v3 - deep-reinforcement-learning - reinforcement-learning - stable-baselines3 model-index: - name: TQC results: - task: type: reinforcement-learning name: reinforcement-learning dataset: name: PandaPickAndPlace-v3 type: PandaPickAndPlace-v3 metrics: - type: mean_reward value: -6.30 +/- 1.79 name: mean_reward verified: false --- # **TQC** Agent playing **PandaPickAndPlace-v3** This is a trained model of a **TQC** agent playing **PandaPickAndPlace-v3** using the [stable-baselines3 library](https://github.com/DLR-RM/stable-baselines3). ## Usage (with Stable-baselines3) TODO: Add your code ```python # 1 - 2 env_id = "PandaPickAndPlace-v3" env = gym.make(env_id) # 4 from stable_baselines3 import HerReplayBuffer, SAC model = TQC(policy = "MultiInputPolicy", env = env, batch_size=2048, gamma=0.95, learning_rate=1e-4, train_freq=64, gradient_steps=64, tau=0.05, replay_buffer_class=HerReplayBuffer, replay_buffer_kwargs=dict( n_sampled_goal=4, goal_selection_strategy="future", ), policy_kwargs=dict( net_arch=[512, 512, 512], n_critics=2, ), tensorboard_log=f"runs/{wandb_run.id}", ) # 5 model.learn(1_000_000, progress_bar=True, callback=WandbCallback(verbose=2)) wandb_run.finish() ``` Weights & Biases charts: https://wandb.ai/patonw/PandaPickAndPlace-v3/runs/w7lzlwnx/workspace?workspace=user-patonw