--- library_name: stable-baselines3 tags: - AntBulletEnv-v0 - deep-reinforcement-learning - reinforcement-learning - stable-baselines3 model-index: - name: A2C results: - task: type: reinforcement-learning name: reinforcement-learning dataset: name: AntBulletEnv-v0 type: AntBulletEnv-v0 metrics: - type: mean_reward value: 1834.41 +/- 107.15 name: mean_reward verified: false --- # **A2C** Agent playing **AntBulletEnv-v0** This is a trained model of a **A2C** agent playing **AntBulletEnv-v0** using the [stable-baselines3 library](https://github.com/DLR-RM/stable-baselines3). ## Usage (with Stable-baselines3) TODO: Add your code ```python import pybullet_envs import panda_gym import gym import os from huggingface_sb3 import load_from_hub, package_to_hub from stable_baselines3 import A2C from stable_baselines3.common.evaluation import evaluate_policy from stable_baselines3.common.vec_env import DummyVecEnv, VecNormalize from stable_baselines3.common.env_util import make_vec_env from huggingface_hub import notebook_login #Environment 1: AntBulletEnv-v0 env_id = "AntBulletEnv-v0" # Create the env env = gym.make(env_id) env = make_vec_env(env_id, n_envs=4) # Adding this wrapper to normalize the observation and the reward env = VecNormalize(env, norm_obs=True, norm_reward=True, clip_obs=10) #create A2C model model = A2C(policy = "MlpPolicy", env = env, gae_lambda = 0.9, gamma = 0.99, learning_rate = 0.00096, max_grad_norm = 0.5, n_steps = 8, vf_coef = 0.4, ent_coef = 0.0, seed=11, policy_kwargs=dict( log_std_init=-2, ortho_init=False), normalize_advantage=False, use_rms_prop= True, use_sde= True, verbose=1) #train agent model.learn(1_500_000) # Save the model and VecNormalize statistics when saving the agent model.save("a2c-AntBulletEnv-v0") env.save("vec_normalize.pkl") ```