--- library_name: stable-baselines3 tags: - PandaReachDense-v2 - deep-reinforcement-learning - reinforcement-learning - stable-baselines3 model-index: - name: A2C results: - task: type: reinforcement-learning name: reinforcement-learning dataset: name: PandaReachDense-v2 type: PandaReachDense-v2 metrics: - type: mean_reward value: -0.74 +/- 0.27 name: mean_reward verified: false --- # **A2C** Agent playing **PandaReachDense-v2** This is a trained model of a **A2C** agent playing **PandaReachDense-v2** using the [stable-baselines3 library](https://github.com/DLR-RM/stable-baselines3). ## Usage (with Stable-baselines3) TODO: Add your code ```python #install !apt install python-opengl !apt install ffmpeg !apt install xvfb !pip3 install pyvirtualdisplay !pip install -r https://raw.githubusercontent.com/huggingface/deep-rl-class/main/notebooks/unit6/requirements-unit6.txt # Virtual display from pyvirtualdisplay import Display virtual_display = Display(visible=0, size=(1400, 900)) virtual_display.start() #imports import pybullet_envs import panda_gym import gym import os from huggingface_sb3 import load_from_hub, package_to_hub from stable_baselines3 import A2C from stable_baselines3.common.evaluation import evaluate_policy from stable_baselines3.common.vec_env import DummyVecEnv, VecNormalize from stable_baselines3.common.env_util import make_vec_env from huggingface_hub import notebook_login #Define the environment called "PandaReachDense-v2" env_id = "PandaReachDense-v2" #Make a vectorized environment env = make_vec_env(env_id, n_envs=4) #Add a wrapper to normalize the observations and rewards. Check the documentation env = VecNormalize(env, norm_obs=True, norm_reward=True, clip_obs=10) #Create the A2C Model (don't forget verbose=1 to print the training logs). model = A2C(policy = "MultiInputPolicy", env = env, gae_lambda = 0.9, gamma = 0.95, learning_rate = 0.001, max_grad_norm = 0.5, n_steps = 8, vf_coef = 0.4, ent_coef = 0.0, seed=11, policy_kwargs=dict( log_std_init=-2, ortho_init=False), normalize_advantage=False, use_rms_prop= True, use_sde= True, verbose=1) #Train it for 1M Timesteps model.learn(1_500_000) #Save the model and VecNormalize statistics when saving the agent model.save(f"a2c-{env_id}") env.save(f"vec_normalize_{env_id}.pkl") #Evaluate your agent eval_env = DummyVecEnv([lambda: gym.make(env_id)]) eval_env = VecNormalize.load(f"vec_normalize_{env_id}.pkl", eval_env) # do not update them at test time eval_env.training = False # reward normalization is not needed at test time eval_env.norm_reward = False # Load the model model = A2C.load(f"a2c-{env_id}") #Evaluate model mean_reward, std_reward = evaluate_policy(model, eval_env) print(f"Mean reward = {mean_reward:.2f} +/- {std_reward:.2f}") ... ```