What's the difference between pandareach v1,v2 and v3

#1
by zhangpaipai - opened

Hi, I like your model and env very much. What's the difference between pandareach v1,v2 and v3? And to what extent(mean reward) can we decide the task has a good performance?

Stable-Baselines3 org

To see the detailed list of changes, see https://github.com/qgallouedec/panda-gym/releases?page=1

To put it briefly:

  • Between v1 and v2,
    • the dynamic has changed a bit: frictions are better managed which should make learning easier.
    • v2 depends on gym >= 0.22 (breaking change with previous versions, e.g. step returns 5 value instead of 4)
  • Between v2 and v3, we went from gym to gymnasium.
Stable-Baselines3 org

And to what extent(mean reward) can we decide the task has a good performance?

I would prefer to look at the success rate.
But if you really want to look at the reward, you can easily interpret the total reward: it is essentially the opposite of the average time to complete the task. For example, this model takes an average of 2.30 time steps to solve the task (mind you, there is a maximum time of 50 time steps).

Sign up or log in to comment