qgallouedec HF staff commited on
Commit
a788d9c
1 Parent(s): 5e78f96

Upload model card

Browse files
Files changed (1) hide show
  1. README.md +2610 -0
README.md ADDED
@@ -0,0 +1,2610 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ tags:
3
+ - reinforcement-learning
4
+ - atari-alien
5
+ - atari-amidar
6
+ - atari-assault
7
+ - atari-asterix
8
+ - atari-asteroids
9
+ - atari-atlantis
10
+ - atari-bankheist
11
+ - atari-battlezone
12
+ - atari-beamrider
13
+ - atari-berzerk
14
+ - atari-bowling
15
+ - atari-boxing
16
+ - atari-breakout
17
+ - atari-centipede
18
+ - atari-choppercommand
19
+ - atari-crazyclimber
20
+ - atari-defender
21
+ - atari-demonattack
22
+ - atari-doubledunk
23
+ - atari-enduro
24
+ - atari-fishingderby
25
+ - atari-freeway
26
+ - atari-frostbite
27
+ - atari-gopher
28
+ - atari-gravitar
29
+ - atari-hero
30
+ - atari-icehockey
31
+ - atari-jamesbond
32
+ - atari-kangaroo
33
+ - atari-krull
34
+ - atari-kungfumaster
35
+ - atari-montezumarevenge
36
+ - atari-mspacman
37
+ - atari-namethisgame
38
+ - atari-phoenix
39
+ - atari-pitfall
40
+ - atari-pong
41
+ - atari-privateeye
42
+ - atari-qbert
43
+ - atari-riverraid
44
+ - atari-roadrunner
45
+ - atari-robotank
46
+ - atari-seaquest
47
+ - atari-skiing
48
+ - atari-solaris
49
+ - atari-spaceinvaders
50
+ - atari-stargunner
51
+ - atari-surround
52
+ - atari-tennis
53
+ - atari-timepilot
54
+ - atari-tutankham
55
+ - atari-upndown
56
+ - atari-venture
57
+ - atari-videopinball
58
+ - atari-wizardofwor
59
+ - atari-yarsrevenge
60
+ - atari-zaxxon
61
+ - babyai-action-obj-door
62
+ - babyai-blocked-unlock-pickup
63
+ - babyai-boss-level-no-unlock
64
+ - babyai-boss-level
65
+ - babyai-find-obj-s5
66
+ - babyai-go-to-door
67
+ - babyai-go-to-imp-unlock
68
+ - babyai-go-to-local
69
+ - babyai-go-to-obj-door
70
+ - babyai-go-to-obj
71
+ - babyai-go-to-red-ball-grey
72
+ - babyai-go-to-red-ball-no-dists
73
+ - babyai-go-to-red-ball
74
+ - babyai-go-to-red-blue-ball
75
+ - babyai-go-to-seq
76
+ - babyai-go-to
77
+ - babyai-key-corridor
78
+ - babyai-mini-boss-level
79
+ - babyai-move-two-across-s8n9
80
+ - babyai-one-room-s8
81
+ - babyai-open-door
82
+ - babyai-open-doors-order-n4
83
+ - babyai-open-red-door
84
+ - babyai-open-two-doors
85
+ - babyai-open
86
+ - babyai-pickup-above
87
+ - babyai-pickup-dist
88
+ - babyai-pickup-loc
89
+ - babyai-pickup
90
+ - babyai-put-next-local
91
+ - babyai-put-next
92
+ - babyai-synth-loc
93
+ - babyai-synth-seq
94
+ - babyai-synth
95
+ - babyai-unblock-pickup
96
+ - babyai-unlock-local
97
+ - babyai-unlock-pickup
98
+ - babyai-unlock-to-unlock
99
+ - babyai-unlock
100
+ - metaworld-assembly
101
+ - metaworld-basketball
102
+ - metaworld-bin-picking
103
+ - metaworld-box-close
104
+ - metaworld-button-press-topdown-wall
105
+ - metaworld-button-press-topdown
106
+ - metaworld-button-press-wall
107
+ - metaworld-button-press
108
+ - metaworld-coffee-button
109
+ - metaworld-coffee-pull
110
+ - metaworld-coffee-push
111
+ - metaworld-dial-turn
112
+ - metaworld-disassemble
113
+ - metaworld-door-close
114
+ - metaworld-door-lock
115
+ - metaworld-door-open
116
+ - metaworld-door-unlock
117
+ - metaworld-drawer-close
118
+ - metaworld-drawer-open
119
+ - metaworld-faucet-close
120
+ - metaworld-faucet-open
121
+ - metaworld-hammer
122
+ - metaworld-hand-insert
123
+ - metaworld-handle-press-side
124
+ - metaworld-handle-press
125
+ - metaworld-handle-pull-side
126
+ - metaworld-handle-pull
127
+ - metaworld-lever-pull
128
+ - metaworld-peg-insert-side
129
+ - metaworld-peg-unplug-side
130
+ - metaworld-pick-out-of-hole
131
+ - metaworld-pick-place-wall
132
+ - metaworld-pick-place
133
+ - metaworld-plate-slide-back-side
134
+ - metaworld-plate-slide-back
135
+ - metaworld-plate-slide-side
136
+ - metaworld-plate-slide
137
+ - metaworld-push-back
138
+ - metaworld-push-wall
139
+ - metaworld-push
140
+ - metaworld-reach-wall
141
+ - metaworld-reach
142
+ - metaworld-shelf-place
143
+ - metaworld-soccer
144
+ - metaworld-stick-pull
145
+ - metaworld-stick-push
146
+ - metaworld-sweep-into
147
+ - metaworld-sweep
148
+ - metaworld-window-close
149
+ - metaworld-window-open
150
+ - mujoco-ant
151
+ - mujoco-doublependulum
152
+ - mujoco-halfcheetah
153
+ - mujoco-hopper
154
+ - mujoco-humanoid
155
+ - mujoco-pendulum
156
+ - mujoco-pusher
157
+ - mujoco-reacher
158
+ - mujoco-standup
159
+ - mujoco-swimmer
160
+ - mujoco-walker
161
+ datasets: jat-project/jat-dataset
162
+ pipeline_tag: reinforcement-learning
163
+ model-index:
164
+ - name: jat-project/jat
165
+ results:
166
+ - task:
167
+ type: reinforcement-learning
168
+ name: Reinforcement Learning
169
+ dataset:
170
+ name: Atari 57
171
+ type: atari
172
+ metrics:
173
+ - type: iqm_expert_normalized_total_reward
174
+ value: 0.06 [0.06, 0.06]
175
+ name: IQM expert normalized total reward
176
+ - type: iqm_human_normalized_total_reward
177
+ value: 0.17 [0.16, 0.17]
178
+ name: IQM human normalized total reward
179
+ - task:
180
+ type: reinforcement-learning
181
+ name: Reinforcement Learning
182
+ dataset:
183
+ name: BabyAI
184
+ type: babyai
185
+ metrics:
186
+ - type: iqm_expert_normalized_total_reward
187
+ value: 0.99 [0.99, 0.99]
188
+ name: IQM expert normalized total reward
189
+ - task:
190
+ type: reinforcement-learning
191
+ name: Reinforcement Learning
192
+ dataset:
193
+ name: MetaWorld
194
+ type: metaworld
195
+ metrics:
196
+ - type: iqm_expert_normalized_total_reward
197
+ value: 0.68 [0.67, 0.69]
198
+ name: IQM expert normalized total reward
199
+ - task:
200
+ type: reinforcement-learning
201
+ name: Reinforcement Learning
202
+ dataset:
203
+ name: MuJoCo
204
+ type: mujoco
205
+ metrics:
206
+ - type: iqm_expert_normalized_total_reward
207
+ value: 0.81 [0.80, 0.82]
208
+ name: IQM expert normalized total reward
209
+ - task:
210
+ type: reinforcement-learning
211
+ name: Reinforcement Learning
212
+ dataset:
213
+ name: Alien
214
+ type: atari-alien
215
+ metrics:
216
+ - type: total_reward
217
+ value: 1085.90 +/- 396.36
218
+ name: Total reward
219
+ - type: expert_normalized_total_reward
220
+ value: 0.05 +/- 0.02
221
+ name: Expert normalized total reward
222
+ - type: human_normalized_total_reward
223
+ value: 0.12 +/- 0.06
224
+ name: Human normalized total reward
225
+ - task:
226
+ type: reinforcement-learning
227
+ name: Reinforcement Learning
228
+ dataset:
229
+ name: Amidar
230
+ type: atari-amidar
231
+ metrics:
232
+ - type: total_reward
233
+ value: 41.26 +/- 28.57
234
+ name: Total reward
235
+ - type: expert_normalized_total_reward
236
+ value: 0.02 +/- 0.01
237
+ name: Expert normalized total reward
238
+ - type: human_normalized_total_reward
239
+ value: 0.02 +/- 0.02
240
+ name: Human normalized total reward
241
+ - task:
242
+ type: reinforcement-learning
243
+ name: Reinforcement Learning
244
+ dataset:
245
+ name: Assault
246
+ type: atari-assault
247
+ metrics:
248
+ - type: total_reward
249
+ value: 772.89 +/- 59.34
250
+ name: Total reward
251
+ - type: expert_normalized_total_reward
252
+ value: 0.04 +/- 0.00
253
+ name: Expert normalized total reward
254
+ - type: human_normalized_total_reward
255
+ value: 1.06 +/- 0.11
256
+ name: Human normalized total reward
257
+ - task:
258
+ type: reinforcement-learning
259
+ name: Reinforcement Learning
260
+ dataset:
261
+ name: Asterix
262
+ type: atari-asterix
263
+ metrics:
264
+ - type: total_reward
265
+ value: 778.50 +/- 428.97
266
+ name: Total reward
267
+ - type: expert_normalized_total_reward
268
+ value: 0.16 +/- 0.12
269
+ name: Expert normalized total reward
270
+ - type: human_normalized_total_reward
271
+ value: 0.07 +/- 0.05
272
+ name: Human normalized total reward
273
+ - task:
274
+ type: reinforcement-learning
275
+ name: Reinforcement Learning
276
+ dataset:
277
+ name: Asteroids
278
+ type: atari-asteroids
279
+ metrics:
280
+ - type: total_reward
281
+ value: 1423.60 +/- 538.79
282
+ name: Total reward
283
+ - type: expert_normalized_total_reward
284
+ value: 0.00 +/- 0.00
285
+ name: Expert normalized total reward
286
+ - type: human_normalized_total_reward
287
+ value: 0.02 +/- 0.01
288
+ name: Human normalized total reward
289
+ - task:
290
+ type: reinforcement-learning
291
+ name: Reinforcement Learning
292
+ dataset:
293
+ name: Atlantis
294
+ type: atari-atlantis
295
+ metrics:
296
+ - type: total_reward
297
+ value: 23541.00 +/- 10376.72
298
+ name: Total reward
299
+ - type: expert_normalized_total_reward
300
+ value: 0.03 +/- 0.03
301
+ name: Expert normalized total reward
302
+ - type: human_normalized_total_reward
303
+ value: 0.66 +/- 0.64
304
+ name: Human normalized total reward
305
+ - task:
306
+ type: reinforcement-learning
307
+ name: Reinforcement Learning
308
+ dataset:
309
+ name: Bank Heist
310
+ type: atari-bankheist
311
+ metrics:
312
+ - type: total_reward
313
+ value: 685.50 +/- 157.92
314
+ name: Total reward
315
+ - type: expert_normalized_total_reward
316
+ value: 0.51 +/- 0.12
317
+ name: Expert normalized total reward
318
+ - type: human_normalized_total_reward
319
+ value: 0.91 +/- 0.21
320
+ name: Human normalized total reward
321
+ - task:
322
+ type: reinforcement-learning
323
+ name: Reinforcement Learning
324
+ dataset:
325
+ name: Battle Zone
326
+ type: atari-battlezone
327
+ metrics:
328
+ - type: total_reward
329
+ value: 12950.00 +/- 4306.68
330
+ name: Total reward
331
+ - type: expert_normalized_total_reward
332
+ value: 0.04 +/- 0.01
333
+ name: Expert normalized total reward
334
+ - type: human_normalized_total_reward
335
+ value: 0.34 +/- 0.12
336
+ name: Human normalized total reward
337
+ - task:
338
+ type: reinforcement-learning
339
+ name: Reinforcement Learning
340
+ dataset:
341
+ name: Beam Rider
342
+ type: atari-beamrider
343
+ metrics:
344
+ - type: total_reward
345
+ value: 762.04 +/- 243.25
346
+ name: Total reward
347
+ - type: expert_normalized_total_reward
348
+ value: 0.01 +/- 0.01
349
+ name: Expert normalized total reward
350
+ - type: human_normalized_total_reward
351
+ value: 0.02 +/- 0.01
352
+ name: Human normalized total reward
353
+ - task:
354
+ type: reinforcement-learning
355
+ name: Reinforcement Learning
356
+ dataset:
357
+ name: Berzerk
358
+ type: atari-berzerk
359
+ metrics:
360
+ - type: total_reward
361
+ value: 523.90 +/- 161.95
362
+ name: Total reward
363
+ - type: expert_normalized_total_reward
364
+ value: 0.01 +/- 0.00
365
+ name: Expert normalized total reward
366
+ - type: human_normalized_total_reward
367
+ value: 0.16 +/- 0.06
368
+ name: Human normalized total reward
369
+ - task:
370
+ type: reinforcement-learning
371
+ name: Reinforcement Learning
372
+ dataset:
373
+ name: Bowling
374
+ type: atari-bowling
375
+ metrics:
376
+ - type: total_reward
377
+ value: 29.99 +/- 11.49
378
+ name: Total reward
379
+ - type: expert_normalized_total_reward
380
+ value: 1.00 +/- 0.00
381
+ name: Expert normalized total reward
382
+ - type: human_normalized_total_reward
383
+ value: 0.05 +/- 0.08
384
+ name: Human normalized total reward
385
+ - task:
386
+ type: reinforcement-learning
387
+ name: Reinforcement Learning
388
+ dataset:
389
+ name: Boxing
390
+ type: atari-boxing
391
+ metrics:
392
+ - type: total_reward
393
+ value: 87.00 +/- 22.57
394
+ name: Total reward
395
+ - type: expert_normalized_total_reward
396
+ value: 0.89 +/- 0.23
397
+ name: Expert normalized total reward
398
+ - type: human_normalized_total_reward
399
+ value: 7.24 +/- 1.88
400
+ name: Human normalized total reward
401
+ - task:
402
+ type: reinforcement-learning
403
+ name: Reinforcement Learning
404
+ dataset:
405
+ name: Breakout
406
+ type: atari-breakout
407
+ metrics:
408
+ - type: total_reward
409
+ value: 9.16 +/- 5.76
410
+ name: Total reward
411
+ - type: expert_normalized_total_reward
412
+ value: 0.01 +/- 0.01
413
+ name: Expert normalized total reward
414
+ - type: human_normalized_total_reward
415
+ value: 0.26 +/- 0.20
416
+ name: Human normalized total reward
417
+ - task:
418
+ type: reinforcement-learning
419
+ name: Reinforcement Learning
420
+ dataset:
421
+ name: Centipede
422
+ type: atari-centipede
423
+ metrics:
424
+ - type: total_reward
425
+ value: 4461.72 +/- 2188.80
426
+ name: Total reward
427
+ - type: expert_normalized_total_reward
428
+ value: 0.25 +/- 0.23
429
+ name: Expert normalized total reward
430
+ - type: human_normalized_total_reward
431
+ value: 0.24 +/- 0.22
432
+ name: Human normalized total reward
433
+ - task:
434
+ type: reinforcement-learning
435
+ name: Reinforcement Learning
436
+ dataset:
437
+ name: Chopper Command
438
+ type: atari-choppercommand
439
+ metrics:
440
+ - type: total_reward
441
+ value: 1497.00 +/- 723.11
442
+ name: Total reward
443
+ - type: expert_normalized_total_reward
444
+ value: 0.01 +/- 0.01
445
+ name: Expert normalized total reward
446
+ - type: human_normalized_total_reward
447
+ value: 0.10 +/- 0.11
448
+ name: Human normalized total reward
449
+ - task:
450
+ type: reinforcement-learning
451
+ name: Reinforcement Learning
452
+ dataset:
453
+ name: Crazy Climber
454
+ type: atari-crazyclimber
455
+ metrics:
456
+ - type: total_reward
457
+ value: 52850.00 +/- 31617.86
458
+ name: Total reward
459
+ - type: expert_normalized_total_reward
460
+ value: 0.25 +/- 0.19
461
+ name: Expert normalized total reward
462
+ - type: human_normalized_total_reward
463
+ value: 1.68 +/- 1.26
464
+ name: Human normalized total reward
465
+ - task:
466
+ type: reinforcement-learning
467
+ name: Reinforcement Learning
468
+ dataset:
469
+ name: Defender
470
+ type: atari-defender
471
+ metrics:
472
+ - type: total_reward
473
+ value: 10627.50 +/- 4473.21
474
+ name: Total reward
475
+ - type: expert_normalized_total_reward
476
+ value: 0.02 +/- 0.01
477
+ name: Expert normalized total reward
478
+ - type: human_normalized_total_reward
479
+ value: 0.49 +/- 0.28
480
+ name: Human normalized total reward
481
+ - task:
482
+ type: reinforcement-learning
483
+ name: Reinforcement Learning
484
+ dataset:
485
+ name: Demon Attack
486
+ type: atari-demonattack
487
+ metrics:
488
+ - type: total_reward
489
+ value: 315.10 +/- 279.01
490
+ name: Total reward
491
+ - type: expert_normalized_total_reward
492
+ value: 0.00 +/- 0.00
493
+ name: Expert normalized total reward
494
+ - type: human_normalized_total_reward
495
+ value: 0.09 +/- 0.15
496
+ name: Human normalized total reward
497
+ - task:
498
+ type: reinforcement-learning
499
+ name: Reinforcement Learning
500
+ dataset:
501
+ name: Double Dunk
502
+ type: atari-doubledunk
503
+ metrics:
504
+ - type: total_reward
505
+ value: 0.08 +/- 11.61
506
+ name: Total reward
507
+ - type: expert_normalized_total_reward
508
+ value: 0.47 +/- 0.29
509
+ name: Expert normalized total reward
510
+ - type: human_normalized_total_reward
511
+ value: 0.53 +/- 0.33
512
+ name: Human normalized total reward
513
+ - task:
514
+ type: reinforcement-learning
515
+ name: Reinforcement Learning
516
+ dataset:
517
+ name: Enduro
518
+ type: atari-enduro
519
+ metrics:
520
+ - type: total_reward
521
+ value: 111.49 +/- 27.36
522
+ name: Total reward
523
+ - type: expert_normalized_total_reward
524
+ value: 0.05 +/- 0.01
525
+ name: Expert normalized total reward
526
+ - type: human_normalized_total_reward
527
+ value: 0.13 +/- 0.03
528
+ name: Human normalized total reward
529
+ - task:
530
+ type: reinforcement-learning
531
+ name: Reinforcement Learning
532
+ dataset:
533
+ name: Fishing Derby
534
+ type: atari-fishingderby
535
+ metrics:
536
+ - type: total_reward
537
+ value: -55.21 +/- 19.35
538
+ name: Total reward
539
+ - type: expert_normalized_total_reward
540
+ value: 0.37 +/- 0.20
541
+ name: Expert normalized total reward
542
+ - type: human_normalized_total_reward
543
+ value: 0.28 +/- 0.15
544
+ name: Human normalized total reward
545
+ - task:
546
+ type: reinforcement-learning
547
+ name: Reinforcement Learning
548
+ dataset:
549
+ name: Freeway
550
+ type: atari-freeway
551
+ metrics:
552
+ - type: total_reward
553
+ value: 24.12 +/- 1.64
554
+ name: Total reward
555
+ - type: expert_normalized_total_reward
556
+ value: 0.71 +/- 0.05
557
+ name: Expert normalized total reward
558
+ - type: human_normalized_total_reward
559
+ value: 0.81 +/- 0.06
560
+ name: Human normalized total reward
561
+ - task:
562
+ type: reinforcement-learning
563
+ name: Reinforcement Learning
564
+ dataset:
565
+ name: Frostbite
566
+ type: atari-frostbite
567
+ metrics:
568
+ - type: total_reward
569
+ value: 617.30 +/- 686.11
570
+ name: Total reward
571
+ - type: expert_normalized_total_reward
572
+ value: 0.04 +/- 0.05
573
+ name: Expert normalized total reward
574
+ - type: human_normalized_total_reward
575
+ value: 0.13 +/- 0.16
576
+ name: Human normalized total reward
577
+ - task:
578
+ type: reinforcement-learning
579
+ name: Reinforcement Learning
580
+ dataset:
581
+ name: Gopher
582
+ type: atari-gopher
583
+ metrics:
584
+ - type: total_reward
585
+ value: 2947.20 +/- 1448.32
586
+ name: Total reward
587
+ - type: expert_normalized_total_reward
588
+ value: 0.03 +/- 0.02
589
+ name: Expert normalized total reward
590
+ - type: human_normalized_total_reward
591
+ value: 1.25 +/- 0.67
592
+ name: Human normalized total reward
593
+ - task:
594
+ type: reinforcement-learning
595
+ name: Reinforcement Learning
596
+ dataset:
597
+ name: Gravitar
598
+ type: atari-gravitar
599
+ metrics:
600
+ - type: total_reward
601
+ value: 1030.50 +/- 719.20
602
+ name: Total reward
603
+ - type: expert_normalized_total_reward
604
+ value: 0.22 +/- 0.19
605
+ name: Expert normalized total reward
606
+ - type: human_normalized_total_reward
607
+ value: 0.27 +/- 0.23
608
+ name: Human normalized total reward
609
+ - task:
610
+ type: reinforcement-learning
611
+ name: Reinforcement Learning
612
+ dataset:
613
+ name: H.E.R.O.
614
+ type: atari-hero
615
+ metrics:
616
+ - type: total_reward
617
+ value: 6997.95 +/- 2562.51
618
+ name: Total reward
619
+ - type: expert_normalized_total_reward
620
+ value: 0.14 +/- 0.06
621
+ name: Expert normalized total reward
622
+ - type: human_normalized_total_reward
623
+ value: 0.20 +/- 0.09
624
+ name: Human normalized total reward
625
+ - task:
626
+ type: reinforcement-learning
627
+ name: Reinforcement Learning
628
+ dataset:
629
+ name: Ice Hockey
630
+ type: atari-icehockey
631
+ metrics:
632
+ - type: total_reward
633
+ value: -3.77 +/- 3.10
634
+ name: Total reward
635
+ - type: expert_normalized_total_reward
636
+ value: 0.20 +/- 0.09
637
+ name: Expert normalized total reward
638
+ - type: human_normalized_total_reward
639
+ value: 0.61 +/- 0.26
640
+ name: Human normalized total reward
641
+ - task:
642
+ type: reinforcement-learning
643
+ name: Reinforcement Learning
644
+ dataset:
645
+ name: James Bond
646
+ type: atari-jamesbond
647
+ metrics:
648
+ - type: total_reward
649
+ value: 187.50 +/- 72.24
650
+ name: Total reward
651
+ - type: expert_normalized_total_reward
652
+ value: 0.01 +/- 0.00
653
+ name: Expert normalized total reward
654
+ - type: human_normalized_total_reward
655
+ value: 0.58 +/- 0.26
656
+ name: Human normalized total reward
657
+ - task:
658
+ type: reinforcement-learning
659
+ name: Reinforcement Learning
660
+ dataset:
661
+ name: Kangaroo
662
+ type: atari-kangaroo
663
+ metrics:
664
+ - type: total_reward
665
+ value: 124.00 +/- 156.92
666
+ name: Total reward
667
+ - type: expert_normalized_total_reward
668
+ value: 0.14 +/- 0.30
669
+ name: Expert normalized total reward
670
+ - type: human_normalized_total_reward
671
+ value: 0.02 +/- 0.05
672
+ name: Human normalized total reward
673
+ - task:
674
+ type: reinforcement-learning
675
+ name: Reinforcement Learning
676
+ dataset:
677
+ name: Krull
678
+ type: atari-krull
679
+ metrics:
680
+ - type: total_reward
681
+ value: 8933.00 +/- 1358.65
682
+ name: Total reward
683
+ - type: expert_normalized_total_reward
684
+ value: 0.75 +/- 0.14
685
+ name: Expert normalized total reward
686
+ - type: human_normalized_total_reward
687
+ value: 6.87 +/- 1.27
688
+ name: Human normalized total reward
689
+ - task:
690
+ type: reinforcement-learning
691
+ name: Reinforcement Learning
692
+ dataset:
693
+ name: Kung-Fu Master
694
+ type: atari-kungfumaster
695
+ metrics:
696
+ - type: total_reward
697
+ value: 100.00 +/- 142.13
698
+ name: Total reward
699
+ - type: expert_normalized_total_reward
700
+ value: -0.00 +/- 0.00
701
+ name: Expert normalized total reward
702
+ - type: human_normalized_total_reward
703
+ value: -0.01 +/- 0.01
704
+ name: Human normalized total reward
705
+ - task:
706
+ type: reinforcement-learning
707
+ name: Reinforcement Learning
708
+ dataset:
709
+ name: Montezuma's Revenge
710
+ type: atari-montezumarevenge
711
+ metrics:
712
+ - type: total_reward
713
+ value: 0.00 +/- 0.00
714
+ name: Total reward
715
+ - type: expert_normalized_total_reward
716
+ value: 0.00 +/- 0.00
717
+ name: Expert normalized total reward
718
+ - type: human_normalized_total_reward
719
+ value: 0.00 +/- 0.00
720
+ name: Human normalized total reward
721
+ - task:
722
+ type: reinforcement-learning
723
+ name: Reinforcement Learning
724
+ dataset:
725
+ name: Ms. Pacman
726
+ type: atari-mspacman
727
+ metrics:
728
+ - type: total_reward
729
+ value: 1516.30 +/- 376.72
730
+ name: Total reward
731
+ - type: expert_normalized_total_reward
732
+ value: 0.18 +/- 0.06
733
+ name: Expert normalized total reward
734
+ - type: human_normalized_total_reward
735
+ value: 0.18 +/- 0.06
736
+ name: Human normalized total reward
737
+ - task:
738
+ type: reinforcement-learning
739
+ name: Reinforcement Learning
740
+ dataset:
741
+ name: Name This Game
742
+ type: atari-namethisgame
743
+ metrics:
744
+ - type: total_reward
745
+ value: 3798.60 +/- 1361.64
746
+ name: Total reward
747
+ - type: expert_normalized_total_reward
748
+ value: 0.07 +/- 0.07
749
+ name: Expert normalized total reward
750
+ - type: human_normalized_total_reward
751
+ value: 0.26 +/- 0.24
752
+ name: Human normalized total reward
753
+ - task:
754
+ type: reinforcement-learning
755
+ name: Reinforcement Learning
756
+ dataset:
757
+ name: Phoenix
758
+ type: atari-phoenix
759
+ metrics:
760
+ - type: total_reward
761
+ value: 1267.50 +/- 1013.72
762
+ name: Total reward
763
+ - type: expert_normalized_total_reward
764
+ value: 0.00 +/- 0.00
765
+ name: Expert normalized total reward
766
+ - type: human_normalized_total_reward
767
+ value: 0.08 +/- 0.16
768
+ name: Human normalized total reward
769
+ - task:
770
+ type: reinforcement-learning
771
+ name: Reinforcement Learning
772
+ dataset:
773
+ name: PitFall
774
+ type: atari-pitfall
775
+ metrics:
776
+ - type: total_reward
777
+ value: -287.36 +/- 492.82
778
+ name: Total reward
779
+ - type: expert_normalized_total_reward
780
+ value: -0.25 +/- 2.16
781
+ name: Expert normalized total reward
782
+ - type: human_normalized_total_reward
783
+ value: -0.01 +/- 0.07
784
+ name: Human normalized total reward
785
+ - task:
786
+ type: reinforcement-learning
787
+ name: Reinforcement Learning
788
+ dataset:
789
+ name: Pong
790
+ type: atari-pong
791
+ metrics:
792
+ - type: total_reward
793
+ value: -11.03 +/- 11.29
794
+ name: Total reward
795
+ - type: expert_normalized_total_reward
796
+ value: 0.23 +/- 0.27
797
+ name: Expert normalized total reward
798
+ - type: human_normalized_total_reward
799
+ value: 0.27 +/- 0.32
800
+ name: Human normalized total reward
801
+ - task:
802
+ type: reinforcement-learning
803
+ name: Reinforcement Learning
804
+ dataset:
805
+ name: Private Eye
806
+ type: atari-privateeye
807
+ metrics:
808
+ - type: total_reward
809
+ value: 96.00 +/- 19.60
810
+ name: Total reward
811
+ - type: expert_normalized_total_reward
812
+ value: 0.95 +/- 0.26
813
+ name: Expert normalized total reward
814
+ - type: human_normalized_total_reward
815
+ value: 0.00 +/- 0.00
816
+ name: Human normalized total reward
817
+ - task:
818
+ type: reinforcement-learning
819
+ name: Reinforcement Learning
820
+ dataset:
821
+ name: Q*Bert
822
+ type: atari-qbert
823
+ metrics:
824
+ - type: total_reward
825
+ value: 1701.75 +/- 1912.56
826
+ name: Total reward
827
+ - type: expert_normalized_total_reward
828
+ value: 0.04 +/- 0.04
829
+ name: Expert normalized total reward
830
+ - type: human_normalized_total_reward
831
+ value: 0.12 +/- 0.14
832
+ name: Human normalized total reward
833
+ - task:
834
+ type: reinforcement-learning
835
+ name: Reinforcement Learning
836
+ dataset:
837
+ name: River Raid
838
+ type: atari-riverraid
839
+ metrics:
840
+ - type: total_reward
841
+ value: 2793.10 +/- 693.84
842
+ name: Total reward
843
+ - type: expert_normalized_total_reward
844
+ value: 0.11 +/- 0.05
845
+ name: Expert normalized total reward
846
+ - type: human_normalized_total_reward
847
+ value: 0.09 +/- 0.04
848
+ name: Human normalized total reward
849
+ - task:
850
+ type: reinforcement-learning
851
+ name: Reinforcement Learning
852
+ dataset:
853
+ name: Road Runner
854
+ type: atari-roadrunner
855
+ metrics:
856
+ - type: total_reward
857
+ value: 7699.00 +/- 3446.61
858
+ name: Total reward
859
+ - type: expert_normalized_total_reward
860
+ value: 0.10 +/- 0.04
861
+ name: Expert normalized total reward
862
+ - type: human_normalized_total_reward
863
+ value: 0.98 +/- 0.44
864
+ name: Human normalized total reward
865
+ - task:
866
+ type: reinforcement-learning
867
+ name: Reinforcement Learning
868
+ dataset:
869
+ name: Robotank
870
+ type: atari-robotank
871
+ metrics:
872
+ - type: total_reward
873
+ value: 16.36 +/- 5.24
874
+ name: Total reward
875
+ - type: expert_normalized_total_reward
876
+ value: 0.18 +/- 0.07
877
+ name: Expert normalized total reward
878
+ - type: human_normalized_total_reward
879
+ value: 1.46 +/- 0.54
880
+ name: Human normalized total reward
881
+ - task:
882
+ type: reinforcement-learning
883
+ name: Reinforcement Learning
884
+ dataset:
885
+ name: Seaquest
886
+ type: atari-seaquest
887
+ metrics:
888
+ - type: total_reward
889
+ value: 515.20 +/- 141.51
890
+ name: Total reward
891
+ - type: expert_normalized_total_reward
892
+ value: 0.18 +/- 0.06
893
+ name: Expert normalized total reward
894
+ - type: human_normalized_total_reward
895
+ value: 0.01 +/- 0.00
896
+ name: Human normalized total reward
897
+ - task:
898
+ type: reinforcement-learning
899
+ name: Reinforcement Learning
900
+ dataset:
901
+ name: Skiing
902
+ type: atari-skiing
903
+ metrics:
904
+ - type: total_reward
905
+ value: -29396.08 +/- 3289.80
906
+ name: Total reward
907
+ - type: expert_normalized_total_reward
908
+ value: -1.93 +/- 0.52
909
+ name: Expert normalized total reward
910
+ - type: human_normalized_total_reward
911
+ value: -0.96 +/- 0.26
912
+ name: Human normalized total reward
913
+ - task:
914
+ type: reinforcement-learning
915
+ name: Reinforcement Learning
916
+ dataset:
917
+ name: Solaris
918
+ type: atari-solaris
919
+ metrics:
920
+ - type: total_reward
921
+ value: 988.20 +/- 487.42
922
+ name: Total reward
923
+ - type: expert_normalized_total_reward
924
+ value: -2.11 +/- 4.15
925
+ name: Expert normalized total reward
926
+ - type: human_normalized_total_reward
927
+ value: -0.02 +/- 0.04
928
+ name: Human normalized total reward
929
+ - task:
930
+ type: reinforcement-learning
931
+ name: Reinforcement Learning
932
+ dataset:
933
+ name: Space Invaders
934
+ type: atari-spaceinvaders
935
+ metrics:
936
+ - type: total_reward
937
+ value: 339.50 +/- 164.05
938
+ name: Total reward
939
+ - type: expert_normalized_total_reward
940
+ value: 0.01 +/- 0.01
941
+ name: Expert normalized total reward
942
+ - type: human_normalized_total_reward
943
+ value: 0.13 +/- 0.11
944
+ name: Human normalized total reward
945
+ - task:
946
+ type: reinforcement-learning
947
+ name: Reinforcement Learning
948
+ dataset:
949
+ name: Star Gunner
950
+ type: atari-stargunner
951
+ metrics:
952
+ - type: total_reward
953
+ value: 978.00 +/- 638.37
954
+ name: Total reward
955
+ - type: expert_normalized_total_reward
956
+ value: 0.00 +/- 0.00
957
+ name: Expert normalized total reward
958
+ - type: human_normalized_total_reward
959
+ value: 0.03 +/- 0.07
960
+ name: Human normalized total reward
961
+ - task:
962
+ type: reinforcement-learning
963
+ name: Reinforcement Learning
964
+ dataset:
965
+ name: Surround
966
+ type: atari-surround
967
+ metrics:
968
+ - type: total_reward
969
+ value: -8.22 +/- 1.19
970
+ name: Total reward
971
+ - type: expert_normalized_total_reward
972
+ value: 0.09 +/- 0.06
973
+ name: Expert normalized total reward
974
+ - type: human_normalized_total_reward
975
+ value: 0.11 +/- 0.07
976
+ name: Human normalized total reward
977
+ - task:
978
+ type: reinforcement-learning
979
+ name: Reinforcement Learning
980
+ dataset:
981
+ name: Tennis
982
+ type: atari-tennis
983
+ metrics:
984
+ - type: total_reward
985
+ value: -22.38 +/- 2.22
986
+ name: Total reward
987
+ - type: expert_normalized_total_reward
988
+ value: 0.04 +/- 0.06
989
+ name: Expert normalized total reward
990
+ - type: human_normalized_total_reward
991
+ value: 0.04 +/- 0.07
992
+ name: Human normalized total reward
993
+ - task:
994
+ type: reinforcement-learning
995
+ name: Reinforcement Learning
996
+ dataset:
997
+ name: Time Pilot
998
+ type: atari-timepilot
999
+ metrics:
1000
+ - type: total_reward
1001
+ value: 9534.00 +/- 2577.76
1002
+ name: Total reward
1003
+ - type: expert_normalized_total_reward
1004
+ value: 0.09 +/- 0.04
1005
+ name: Expert normalized total reward
1006
+ - type: human_normalized_total_reward
1007
+ value: 3.59 +/- 1.55
1008
+ name: Human normalized total reward
1009
+ - task:
1010
+ type: reinforcement-learning
1011
+ name: Reinforcement Learning
1012
+ dataset:
1013
+ name: Tutankham
1014
+ type: atari-tutankham
1015
+ metrics:
1016
+ - type: total_reward
1017
+ value: 40.20 +/- 14.51
1018
+ name: Total reward
1019
+ - type: expert_normalized_total_reward
1020
+ value: 0.10 +/- 0.05
1021
+ name: Expert normalized total reward
1022
+ - type: human_normalized_total_reward
1023
+ value: 0.18 +/- 0.09
1024
+ name: Human normalized total reward
1025
+ - task:
1026
+ type: reinforcement-learning
1027
+ name: Reinforcement Learning
1028
+ dataset:
1029
+ name: Up and Down
1030
+ type: atari-upndown
1031
+ metrics:
1032
+ - type: total_reward
1033
+ value: 6072.00 +/- 2283.30
1034
+ name: Total reward
1035
+ - type: expert_normalized_total_reward
1036
+ value: 0.01 +/- 0.01
1037
+ name: Expert normalized total reward
1038
+ - type: human_normalized_total_reward
1039
+ value: 0.50 +/- 0.20
1040
+ name: Human normalized total reward
1041
+ - task:
1042
+ type: reinforcement-learning
1043
+ name: Reinforcement Learning
1044
+ dataset:
1045
+ name: Venture
1046
+ type: atari-venture
1047
+ metrics:
1048
+ - type: total_reward
1049
+ value: 0.00 +/- 0.00
1050
+ name: Total reward
1051
+ - type: expert_normalized_total_reward
1052
+ value: 1.00 +/- 0.00
1053
+ name: Expert normalized total reward
1054
+ - type: human_normalized_total_reward
1055
+ value: 0.00 +/- 0.00
1056
+ name: Human normalized total reward
1057
+ - task:
1058
+ type: reinforcement-learning
1059
+ name: Reinforcement Learning
1060
+ dataset:
1061
+ name: Video Pinball
1062
+ type: atari-videopinball
1063
+ metrics:
1064
+ - type: total_reward
1065
+ value: 7943.01 +/- 8351.21
1066
+ name: Total reward
1067
+ - type: expert_normalized_total_reward
1068
+ value: 0.02 +/- 0.02
1069
+ name: Expert normalized total reward
1070
+ - type: human_normalized_total_reward
1071
+ value: 0.45 +/- 0.47
1072
+ name: Human normalized total reward
1073
+ - task:
1074
+ type: reinforcement-learning
1075
+ name: Reinforcement Learning
1076
+ dataset:
1077
+ name: Wizard of Wor
1078
+ type: atari-wizardofwor
1079
+ metrics:
1080
+ - type: total_reward
1081
+ value: 1306.00 +/- 1139.81
1082
+ name: Total reward
1083
+ - type: expert_normalized_total_reward
1084
+ value: 0.02 +/- 0.02
1085
+ name: Expert normalized total reward
1086
+ - type: human_normalized_total_reward
1087
+ value: 0.18 +/- 0.27
1088
+ name: Human normalized total reward
1089
+ - task:
1090
+ type: reinforcement-learning
1091
+ name: Reinforcement Learning
1092
+ dataset:
1093
+ name: Yars Revenge
1094
+ type: atari-yarsrevenge
1095
+ metrics:
1096
+ - type: total_reward
1097
+ value: 8597.41 +/- 4291.81
1098
+ name: Total reward
1099
+ - type: expert_normalized_total_reward
1100
+ value: 0.02 +/- 0.02
1101
+ name: Expert normalized total reward
1102
+ - type: human_normalized_total_reward
1103
+ value: 0.11 +/- 0.08
1104
+ name: Human normalized total reward
1105
+ - task:
1106
+ type: reinforcement-learning
1107
+ name: Reinforcement Learning
1108
+ dataset:
1109
+ name: Zaxxon
1110
+ type: atari-zaxxon
1111
+ metrics:
1112
+ - type: total_reward
1113
+ value: 896.00 +/- 1172.68
1114
+ name: Total reward
1115
+ - type: expert_normalized_total_reward
1116
+ value: 0.01 +/- 0.02
1117
+ name: Expert normalized total reward
1118
+ - type: human_normalized_total_reward
1119
+ value: 0.09 +/- 0.13
1120
+ name: Human normalized total reward
1121
+ - task:
1122
+ type: reinforcement-learning
1123
+ name: Reinforcement Learning
1124
+ dataset:
1125
+ name: Action Obj Door
1126
+ type: babyai-action-obj-door
1127
+ metrics:
1128
+ - type: total_reward
1129
+ value: 0.95 +/- 0.13
1130
+ name: Total reward
1131
+ - type: expert_normalized_total_reward
1132
+ value: 0.94 +/- 0.22
1133
+ name: Expert normalized total reward
1134
+ - task:
1135
+ type: reinforcement-learning
1136
+ name: Reinforcement Learning
1137
+ dataset:
1138
+ name: Blocked Unlock Pickup
1139
+ type: babyai-blocked-unlock-pickup
1140
+ metrics:
1141
+ - type: total_reward
1142
+ value: 0.95 +/- 0.01
1143
+ name: Total reward
1144
+ - type: expert_normalized_total_reward
1145
+ value: 1.00 +/- 0.01
1146
+ name: Expert normalized total reward
1147
+ - task:
1148
+ type: reinforcement-learning
1149
+ name: Reinforcement Learning
1150
+ dataset:
1151
+ name: Boss Level No Unlock
1152
+ type: babyai-boss-level-no-unlock
1153
+ metrics:
1154
+ - type: total_reward
1155
+ value: 0.44 +/- 0.45
1156
+ name: Total reward
1157
+ - type: expert_normalized_total_reward
1158
+ value: 0.43 +/- 0.51
1159
+ name: Expert normalized total reward
1160
+ - task:
1161
+ type: reinforcement-learning
1162
+ name: Reinforcement Learning
1163
+ dataset:
1164
+ name: Boss Level
1165
+ type: babyai-boss-level
1166
+ metrics:
1167
+ - type: total_reward
1168
+ value: 0.48 +/- 0.45
1169
+ name: Total reward
1170
+ - type: expert_normalized_total_reward
1171
+ value: 0.48 +/- 0.51
1172
+ name: Expert normalized total reward
1173
+ - task:
1174
+ type: reinforcement-learning
1175
+ name: Reinforcement Learning
1176
+ dataset:
1177
+ name: Find Obj S5
1178
+ type: babyai-find-obj-s5
1179
+ metrics:
1180
+ - type: total_reward
1181
+ value: 0.95 +/- 0.03
1182
+ name: Total reward
1183
+ - type: expert_normalized_total_reward
1184
+ value: 1.00 +/- 0.04
1185
+ name: Expert normalized total reward
1186
+ - task:
1187
+ type: reinforcement-learning
1188
+ name: Reinforcement Learning
1189
+ dataset:
1190
+ name: Go To Door
1191
+ type: babyai-go-to-door
1192
+ metrics:
1193
+ - type: total_reward
1194
+ value: 0.99 +/- 0.01
1195
+ name: Total reward
1196
+ - type: expert_normalized_total_reward
1197
+ value: 1.00 +/- 0.01
1198
+ name: Expert normalized total reward
1199
+ - task:
1200
+ type: reinforcement-learning
1201
+ name: Reinforcement Learning
1202
+ dataset:
1203
+ name: Go To Imp Unlock
1204
+ type: babyai-go-to-imp-unlock
1205
+ metrics:
1206
+ - type: total_reward
1207
+ value: 0.50 +/- 0.44
1208
+ name: Total reward
1209
+ - type: expert_normalized_total_reward
1210
+ value: 0.56 +/- 0.59
1211
+ name: Expert normalized total reward
1212
+ - task:
1213
+ type: reinforcement-learning
1214
+ name: Reinforcement Learning
1215
+ dataset:
1216
+ name: Go To Local
1217
+ type: babyai-go-to-local
1218
+ metrics:
1219
+ - type: total_reward
1220
+ value: 0.88 +/- 0.14
1221
+ name: Total reward
1222
+ - type: expert_normalized_total_reward
1223
+ value: 0.94 +/- 0.18
1224
+ name: Expert normalized total reward
1225
+ - task:
1226
+ type: reinforcement-learning
1227
+ name: Reinforcement Learning
1228
+ dataset:
1229
+ name: Go To Obj Door
1230
+ type: babyai-go-to-obj-door
1231
+ metrics:
1232
+ - type: total_reward
1233
+ value: 0.98 +/- 0.04
1234
+ name: Total reward
1235
+ - type: expert_normalized_total_reward
1236
+ value: 0.97 +/- 0.08
1237
+ name: Expert normalized total reward
1238
+ - task:
1239
+ type: reinforcement-learning
1240
+ name: Reinforcement Learning
1241
+ dataset:
1242
+ name: Go To Obj
1243
+ type: babyai-go-to-obj
1244
+ metrics:
1245
+ - type: total_reward
1246
+ value: 0.93 +/- 0.04
1247
+ name: Total reward
1248
+ - type: expert_normalized_total_reward
1249
+ value: 0.99 +/- 0.05
1250
+ name: Expert normalized total reward
1251
+ - task:
1252
+ type: reinforcement-learning
1253
+ name: Reinforcement Learning
1254
+ dataset:
1255
+ name: Go To Red Ball Grey
1256
+ type: babyai-go-to-red-ball-grey
1257
+ metrics:
1258
+ - type: total_reward
1259
+ value: 0.91 +/- 0.06
1260
+ name: Total reward
1261
+ - type: expert_normalized_total_reward
1262
+ value: 0.99 +/- 0.08
1263
+ name: Expert normalized total reward
1264
+ - task:
1265
+ type: reinforcement-learning
1266
+ name: Reinforcement Learning
1267
+ dataset:
1268
+ name: Go To Red Ball No Dists
1269
+ type: babyai-go-to-red-ball-no-dists
1270
+ metrics:
1271
+ - type: total_reward
1272
+ value: 0.93 +/- 0.03
1273
+ name: Total reward
1274
+ - type: expert_normalized_total_reward
1275
+ value: 1.00 +/- 0.04
1276
+ name: Expert normalized total reward
1277
+ - task:
1278
+ type: reinforcement-learning
1279
+ name: Reinforcement Learning
1280
+ dataset:
1281
+ name: Go To Red Ball
1282
+ type: babyai-go-to-red-ball
1283
+ metrics:
1284
+ - type: total_reward
1285
+ value: 0.91 +/- 0.08
1286
+ name: Total reward
1287
+ - type: expert_normalized_total_reward
1288
+ value: 0.98 +/- 0.11
1289
+ name: Expert normalized total reward
1290
+ - task:
1291
+ type: reinforcement-learning
1292
+ name: Reinforcement Learning
1293
+ dataset:
1294
+ name: Go To Red Blue Ball
1295
+ type: babyai-go-to-red-blue-ball
1296
+ metrics:
1297
+ - type: total_reward
1298
+ value: 0.88 +/- 0.11
1299
+ name: Total reward
1300
+ - type: expert_normalized_total_reward
1301
+ value: 0.96 +/- 0.13
1302
+ name: Expert normalized total reward
1303
+ - task:
1304
+ type: reinforcement-learning
1305
+ name: Reinforcement Learning
1306
+ dataset:
1307
+ name: Go To Seq
1308
+ type: babyai-go-to-seq
1309
+ metrics:
1310
+ - type: total_reward
1311
+ value: 0.73 +/- 0.34
1312
+ name: Total reward
1313
+ - type: expert_normalized_total_reward
1314
+ value: 0.75 +/- 0.40
1315
+ name: Expert normalized total reward
1316
+ - task:
1317
+ type: reinforcement-learning
1318
+ name: Reinforcement Learning
1319
+ dataset:
1320
+ name: Go To
1321
+ type: babyai-go-to
1322
+ metrics:
1323
+ - type: total_reward
1324
+ value: 0.80 +/- 0.27
1325
+ name: Total reward
1326
+ - type: expert_normalized_total_reward
1327
+ value: 0.85 +/- 0.35
1328
+ name: Expert normalized total reward
1329
+ - task:
1330
+ type: reinforcement-learning
1331
+ name: Reinforcement Learning
1332
+ dataset:
1333
+ name: Key Corridor
1334
+ type: babyai-key-corridor
1335
+ metrics:
1336
+ - type: total_reward
1337
+ value: 0.88 +/- 0.10
1338
+ name: Total reward
1339
+ - type: expert_normalized_total_reward
1340
+ value: 0.97 +/- 0.11
1341
+ name: Expert normalized total reward
1342
+ - task:
1343
+ type: reinforcement-learning
1344
+ name: Reinforcement Learning
1345
+ dataset:
1346
+ name: Mini Boss Level
1347
+ type: babyai-mini-boss-level
1348
+ metrics:
1349
+ - type: total_reward
1350
+ value: 0.69 +/- 0.35
1351
+ name: Total reward
1352
+ - type: expert_normalized_total_reward
1353
+ value: 0.76 +/- 0.43
1354
+ name: Expert normalized total reward
1355
+ - task:
1356
+ type: reinforcement-learning
1357
+ name: Reinforcement Learning
1358
+ dataset:
1359
+ name: Move Two Across S8N9
1360
+ type: babyai-move-two-across-s8n9
1361
+ metrics:
1362
+ - type: total_reward
1363
+ value: 0.03 +/- 0.15
1364
+ name: Total reward
1365
+ - type: expert_normalized_total_reward
1366
+ value: 0.03 +/- 0.16
1367
+ name: Expert normalized total reward
1368
+ - task:
1369
+ type: reinforcement-learning
1370
+ name: Reinforcement Learning
1371
+ dataset:
1372
+ name: One Room S8
1373
+ type: babyai-one-room-s8
1374
+ metrics:
1375
+ - type: total_reward
1376
+ value: 0.92 +/- 0.03
1377
+ name: Total reward
1378
+ - type: expert_normalized_total_reward
1379
+ value: 1.00 +/- 0.04
1380
+ name: Expert normalized total reward
1381
+ - task:
1382
+ type: reinforcement-learning
1383
+ name: Reinforcement Learning
1384
+ dataset:
1385
+ name: Open Door
1386
+ type: babyai-open-door
1387
+ metrics:
1388
+ - type: total_reward
1389
+ value: 0.99 +/- 0.00
1390
+ name: Total reward
1391
+ - type: expert_normalized_total_reward
1392
+ value: 1.00 +/- 0.01
1393
+ name: Expert normalized total reward
1394
+ - task:
1395
+ type: reinforcement-learning
1396
+ name: Reinforcement Learning
1397
+ dataset:
1398
+ name: Open Doors Order N4
1399
+ type: babyai-open-doors-order-n4
1400
+ metrics:
1401
+ - type: total_reward
1402
+ value: 0.96 +/- 0.11
1403
+ name: Total reward
1404
+ - type: expert_normalized_total_reward
1405
+ value: 0.97 +/- 0.13
1406
+ name: Expert normalized total reward
1407
+ - task:
1408
+ type: reinforcement-learning
1409
+ name: Reinforcement Learning
1410
+ dataset:
1411
+ name: Open Red Door
1412
+ type: babyai-open-red-door
1413
+ metrics:
1414
+ - type: total_reward
1415
+ value: 0.92 +/- 0.02
1416
+ name: Total reward
1417
+ - type: expert_normalized_total_reward
1418
+ value: 1.00 +/- 0.03
1419
+ name: Expert normalized total reward
1420
+ - task:
1421
+ type: reinforcement-learning
1422
+ name: Reinforcement Learning
1423
+ dataset:
1424
+ name: Open Two Doors
1425
+ type: babyai-open-two-doors
1426
+ metrics:
1427
+ - type: total_reward
1428
+ value: 0.98 +/- 0.00
1429
+ name: Total reward
1430
+ - type: expert_normalized_total_reward
1431
+ value: 1.00 +/- 0.00
1432
+ name: Expert normalized total reward
1433
+ - task:
1434
+ type: reinforcement-learning
1435
+ name: Reinforcement Learning
1436
+ dataset:
1437
+ name: Open
1438
+ type: babyai-open
1439
+ metrics:
1440
+ - type: total_reward
1441
+ value: 0.93 +/- 0.11
1442
+ name: Total reward
1443
+ - type: expert_normalized_total_reward
1444
+ value: 0.97 +/- 0.13
1445
+ name: Expert normalized total reward
1446
+ - task:
1447
+ type: reinforcement-learning
1448
+ name: Reinforcement Learning
1449
+ dataset:
1450
+ name: Pickup Above
1451
+ type: babyai-pickup-above
1452
+ metrics:
1453
+ - type: total_reward
1454
+ value: 0.92 +/- 0.06
1455
+ name: Total reward
1456
+ - type: expert_normalized_total_reward
1457
+ value: 1.01 +/- 0.07
1458
+ name: Expert normalized total reward
1459
+ - task:
1460
+ type: reinforcement-learning
1461
+ name: Reinforcement Learning
1462
+ dataset:
1463
+ name: Pickup Dist
1464
+ type: babyai-pickup-dist
1465
+ metrics:
1466
+ - type: total_reward
1467
+ value: 0.88 +/- 0.13
1468
+ name: Total reward
1469
+ - type: expert_normalized_total_reward
1470
+ value: 1.03 +/- 0.18
1471
+ name: Expert normalized total reward
1472
+ - task:
1473
+ type: reinforcement-learning
1474
+ name: Reinforcement Learning
1475
+ dataset:
1476
+ name: Pickup Loc
1477
+ type: babyai-pickup-loc
1478
+ metrics:
1479
+ - type: total_reward
1480
+ value: 0.84 +/- 0.20
1481
+ name: Total reward
1482
+ - type: expert_normalized_total_reward
1483
+ value: 0.91 +/- 0.24
1484
+ name: Expert normalized total reward
1485
+ - task:
1486
+ type: reinforcement-learning
1487
+ name: Reinforcement Learning
1488
+ dataset:
1489
+ name: Pickup
1490
+ type: babyai-pickup
1491
+ metrics:
1492
+ - type: total_reward
1493
+ value: 0.72 +/- 0.34
1494
+ name: Total reward
1495
+ - type: expert_normalized_total_reward
1496
+ value: 0.77 +/- 0.40
1497
+ name: Expert normalized total reward
1498
+ - task:
1499
+ type: reinforcement-learning
1500
+ name: Reinforcement Learning
1501
+ dataset:
1502
+ name: Put Next Local
1503
+ type: babyai-put-next-local
1504
+ metrics:
1505
+ - type: total_reward
1506
+ value: 0.60 +/- 0.36
1507
+ name: Total reward
1508
+ - type: expert_normalized_total_reward
1509
+ value: 0.65 +/- 0.39
1510
+ name: Expert normalized total reward
1511
+ - task:
1512
+ type: reinforcement-learning
1513
+ name: Reinforcement Learning
1514
+ dataset:
1515
+ name: Put Next S7N4
1516
+ type: babyai-put-next
1517
+ metrics:
1518
+ - type: total_reward
1519
+ value: 0.82 +/- 0.26
1520
+ name: Total reward
1521
+ - type: expert_normalized_total_reward
1522
+ value: 0.86 +/- 0.27
1523
+ name: Expert normalized total reward
1524
+ - task:
1525
+ type: reinforcement-learning
1526
+ name: Reinforcement Learning
1527
+ dataset:
1528
+ name: Synth Loc
1529
+ type: babyai-synth-loc
1530
+ metrics:
1531
+ - type: total_reward
1532
+ value: 0.82 +/- 0.31
1533
+ name: Total reward
1534
+ - type: expert_normalized_total_reward
1535
+ value: 0.85 +/- 0.38
1536
+ name: Expert normalized total reward
1537
+ - task:
1538
+ type: reinforcement-learning
1539
+ name: Reinforcement Learning
1540
+ dataset:
1541
+ name: Synth Seq
1542
+ type: babyai-synth-seq
1543
+ metrics:
1544
+ - type: total_reward
1545
+ value: 0.57 +/- 0.44
1546
+ name: Total reward
1547
+ - type: expert_normalized_total_reward
1548
+ value: 0.57 +/- 0.50
1549
+ name: Expert normalized total reward
1550
+ - task:
1551
+ type: reinforcement-learning
1552
+ name: Reinforcement Learning
1553
+ dataset:
1554
+ name: Synth
1555
+ type: babyai-synth
1556
+ metrics:
1557
+ - type: total_reward
1558
+ value: 0.68 +/- 0.39
1559
+ name: Total reward
1560
+ - type: expert_normalized_total_reward
1561
+ value: 0.69 +/- 0.47
1562
+ name: Expert normalized total reward
1563
+ - task:
1564
+ type: reinforcement-learning
1565
+ name: Reinforcement Learning
1566
+ dataset:
1567
+ name: Unblock Pickup
1568
+ type: babyai-unblock-pickup
1569
+ metrics:
1570
+ - type: total_reward
1571
+ value: 0.76 +/- 0.33
1572
+ name: Total reward
1573
+ - type: expert_normalized_total_reward
1574
+ value: 0.82 +/- 0.39
1575
+ name: Expert normalized total reward
1576
+ - task:
1577
+ type: reinforcement-learning
1578
+ name: Reinforcement Learning
1579
+ dataset:
1580
+ name: Unlock Local
1581
+ type: babyai-unlock-local
1582
+ metrics:
1583
+ - type: total_reward
1584
+ value: 0.98 +/- 0.01
1585
+ name: Total reward
1586
+ - type: expert_normalized_total_reward
1587
+ value: 1.00 +/- 0.01
1588
+ name: Expert normalized total reward
1589
+ - task:
1590
+ type: reinforcement-learning
1591
+ name: Reinforcement Learning
1592
+ dataset:
1593
+ name: Unlock Pickup
1594
+ type: babyai-unlock-pickup
1595
+ metrics:
1596
+ - type: total_reward
1597
+ value: 0.76 +/- 0.03
1598
+ name: Total reward
1599
+ - type: expert_normalized_total_reward
1600
+ value: 1.01 +/- 0.04
1601
+ name: Expert normalized total reward
1602
+ - task:
1603
+ type: reinforcement-learning
1604
+ name: Reinforcement Learning
1605
+ dataset:
1606
+ name: Unlock To Unlock
1607
+ type: babyai-unlock-to-unlock
1608
+ metrics:
1609
+ - type: total_reward
1610
+ value: 0.86 +/- 0.29
1611
+ name: Total reward
1612
+ - type: expert_normalized_total_reward
1613
+ value: 0.89 +/- 0.30
1614
+ name: Expert normalized total reward
1615
+ - task:
1616
+ type: reinforcement-learning
1617
+ name: Reinforcement Learning
1618
+ dataset:
1619
+ name: Unlock
1620
+ type: babyai-unlock
1621
+ metrics:
1622
+ - type: total_reward
1623
+ value: 0.55 +/- 0.42
1624
+ name: Total reward
1625
+ - type: expert_normalized_total_reward
1626
+ value: 0.63 +/- 0.50
1627
+ name: Expert normalized total reward
1628
+ - task:
1629
+ type: reinforcement-learning
1630
+ name: Reinforcement Learning
1631
+ dataset:
1632
+ name: Assembly
1633
+ type: metaworld-assembly
1634
+ metrics:
1635
+ - type: total_reward
1636
+ value: 238.32 +/- 32.98
1637
+ name: Total reward
1638
+ - type: expert_normalized_total_reward
1639
+ value: 0.96 +/- 0.16
1640
+ name: Expert normalized total reward
1641
+ - task:
1642
+ type: reinforcement-learning
1643
+ name: Reinforcement Learning
1644
+ dataset:
1645
+ name: Basketball
1646
+ type: metaworld-basketball
1647
+ metrics:
1648
+ - type: total_reward
1649
+ value: 1.59 +/- 0.43
1650
+ name: Total reward
1651
+ - type: expert_normalized_total_reward
1652
+ value: -0.00 +/- 0.00
1653
+ name: Expert normalized total reward
1654
+ - task:
1655
+ type: reinforcement-learning
1656
+ name: Reinforcement Learning
1657
+ dataset:
1658
+ name: BinPicking
1659
+ type: metaworld-bin-picking
1660
+ metrics:
1661
+ - type: total_reward
1662
+ value: 374.18 +/- 168.23
1663
+ name: Total reward
1664
+ - type: expert_normalized_total_reward
1665
+ value: 0.88 +/- 0.40
1666
+ name: Expert normalized total reward
1667
+ - task:
1668
+ type: reinforcement-learning
1669
+ name: Reinforcement Learning
1670
+ dataset:
1671
+ name: Box Close
1672
+ type: metaworld-box-close
1673
+ metrics:
1674
+ - type: total_reward
1675
+ value: 510.10 +/- 117.47
1676
+ name: Total reward
1677
+ - type: expert_normalized_total_reward
1678
+ value: 0.99 +/- 0.27
1679
+ name: Expert normalized total reward
1680
+ - task:
1681
+ type: reinforcement-learning
1682
+ name: Reinforcement Learning
1683
+ dataset:
1684
+ name: Button Press Topdown Wall
1685
+ type: metaworld-button-press-topdown-wall
1686
+ metrics:
1687
+ - type: total_reward
1688
+ value: 260.07 +/- 67.75
1689
+ name: Total reward
1690
+ - type: expert_normalized_total_reward
1691
+ value: 0.49 +/- 0.14
1692
+ name: Expert normalized total reward
1693
+ - task:
1694
+ type: reinforcement-learning
1695
+ name: Reinforcement Learning
1696
+ dataset:
1697
+ name: Button Press Topdown
1698
+ type: metaworld-button-press-topdown
1699
+ metrics:
1700
+ - type: total_reward
1701
+ value: 265.16 +/- 77.93
1702
+ name: Total reward
1703
+ - type: expert_normalized_total_reward
1704
+ value: 0.51 +/- 0.17
1705
+ name: Expert normalized total reward
1706
+ - task:
1707
+ type: reinforcement-learning
1708
+ name: Reinforcement Learning
1709
+ dataset:
1710
+ name: Button Press Wall
1711
+ type: metaworld-button-press-wall
1712
+ metrics:
1713
+ - type: total_reward
1714
+ value: 621.75 +/- 137.13
1715
+ name: Total reward
1716
+ - type: expert_normalized_total_reward
1717
+ value: 0.92 +/- 0.21
1718
+ name: Expert normalized total reward
1719
+ - task:
1720
+ type: reinforcement-learning
1721
+ name: Reinforcement Learning
1722
+ dataset:
1723
+ name: Button Press
1724
+ type: metaworld-button-press
1725
+ metrics:
1726
+ - type: total_reward
1727
+ value: 556.75 +/- 198.85
1728
+ name: Total reward
1729
+ - type: expert_normalized_total_reward
1730
+ value: 0.86 +/- 0.33
1731
+ name: Expert normalized total reward
1732
+ - task:
1733
+ type: reinforcement-learning
1734
+ name: Reinforcement Learning
1735
+ dataset:
1736
+ name: Coffee Button
1737
+ type: metaworld-coffee-button
1738
+ metrics:
1739
+ - type: total_reward
1740
+ value: 250.50 +/- 266.92
1741
+ name: Total reward
1742
+ - type: expert_normalized_total_reward
1743
+ value: 0.31 +/- 0.38
1744
+ name: Expert normalized total reward
1745
+ - task:
1746
+ type: reinforcement-learning
1747
+ name: Reinforcement Learning
1748
+ dataset:
1749
+ name: Coffee Pull
1750
+ type: metaworld-coffee-pull
1751
+ metrics:
1752
+ - type: total_reward
1753
+ value: 55.13 +/- 96.96
1754
+ name: Total reward
1755
+ - type: expert_normalized_total_reward
1756
+ value: 0.20 +/- 0.38
1757
+ name: Expert normalized total reward
1758
+ - task:
1759
+ type: reinforcement-learning
1760
+ name: Reinforcement Learning
1761
+ dataset:
1762
+ name: Coffee Push
1763
+ type: metaworld-coffee-push
1764
+ metrics:
1765
+ - type: total_reward
1766
+ value: 269.17 +/- 237.82
1767
+ name: Total reward
1768
+ - type: expert_normalized_total_reward
1769
+ value: 0.54 +/- 0.48
1770
+ name: Expert normalized total reward
1771
+ - task:
1772
+ type: reinforcement-learning
1773
+ name: Reinforcement Learning
1774
+ dataset:
1775
+ name: Dial Turn
1776
+ type: metaworld-dial-turn
1777
+ metrics:
1778
+ - type: total_reward
1779
+ value: 738.22 +/- 168.43
1780
+ name: Total reward
1781
+ - type: expert_normalized_total_reward
1782
+ value: 0.93 +/- 0.22
1783
+ name: Expert normalized total reward
1784
+ - task:
1785
+ type: reinforcement-learning
1786
+ name: Reinforcement Learning
1787
+ dataset:
1788
+ name: Disassemble
1789
+ type: metaworld-disassemble
1790
+ metrics:
1791
+ - type: total_reward
1792
+ value: 39.14 +/- 11.85
1793
+ name: Total reward
1794
+ - type: expert_normalized_total_reward
1795
+ value: -0.47 +/- 4.70
1796
+ name: Expert normalized total reward
1797
+ - task:
1798
+ type: reinforcement-learning
1799
+ name: Reinforcement Learning
1800
+ dataset:
1801
+ name: Door Close
1802
+ type: metaworld-door-close
1803
+ metrics:
1804
+ - type: total_reward
1805
+ value: 528.17 +/- 29.90
1806
+ name: Total reward
1807
+ - type: expert_normalized_total_reward
1808
+ value: 1.00 +/- 0.06
1809
+ name: Expert normalized total reward
1810
+ - task:
1811
+ type: reinforcement-learning
1812
+ name: Reinforcement Learning
1813
+ dataset:
1814
+ name: Door Lock
1815
+ type: metaworld-door-lock
1816
+ metrics:
1817
+ - type: total_reward
1818
+ value: 676.51 +/- 192.68
1819
+ name: Total reward
1820
+ - type: expert_normalized_total_reward
1821
+ value: 0.81 +/- 0.28
1822
+ name: Expert normalized total reward
1823
+ - task:
1824
+ type: reinforcement-learning
1825
+ name: Reinforcement Learning
1826
+ dataset:
1827
+ name: Door Open
1828
+ type: metaworld-door-open
1829
+ metrics:
1830
+ - type: total_reward
1831
+ value: 572.76 +/- 57.53
1832
+ name: Total reward
1833
+ - type: expert_normalized_total_reward
1834
+ value: 0.98 +/- 0.11
1835
+ name: Expert normalized total reward
1836
+ - task:
1837
+ type: reinforcement-learning
1838
+ name: Reinforcement Learning
1839
+ dataset:
1840
+ name: Door Unlock
1841
+ type: metaworld-door-unlock
1842
+ metrics:
1843
+ - type: total_reward
1844
+ value: 654.94 +/- 260.64
1845
+ name: Total reward
1846
+ - type: expert_normalized_total_reward
1847
+ value: 0.79 +/- 0.37
1848
+ name: Expert normalized total reward
1849
+ - task:
1850
+ type: reinforcement-learning
1851
+ name: Reinforcement Learning
1852
+ dataset:
1853
+ name: Drawer Close
1854
+ type: metaworld-drawer-close
1855
+ metrics:
1856
+ - type: total_reward
1857
+ value: 663.02 +/- 214.51
1858
+ name: Total reward
1859
+ - type: expert_normalized_total_reward
1860
+ value: 0.73 +/- 0.29
1861
+ name: Expert normalized total reward
1862
+ - task:
1863
+ type: reinforcement-learning
1864
+ name: Reinforcement Learning
1865
+ dataset:
1866
+ name: Drawer Open
1867
+ type: metaworld-drawer-open
1868
+ metrics:
1869
+ - type: total_reward
1870
+ value: 489.07 +/- 21.28
1871
+ name: Total reward
1872
+ - type: expert_normalized_total_reward
1873
+ value: 0.99 +/- 0.06
1874
+ name: Expert normalized total reward
1875
+ - task:
1876
+ type: reinforcement-learning
1877
+ name: Reinforcement Learning
1878
+ dataset:
1879
+ name: Faucet Close
1880
+ type: metaworld-faucet-close
1881
+ metrics:
1882
+ - type: total_reward
1883
+ value: 361.32 +/- 72.28
1884
+ name: Total reward
1885
+ - type: expert_normalized_total_reward
1886
+ value: 0.22 +/- 0.14
1887
+ name: Expert normalized total reward
1888
+ - task:
1889
+ type: reinforcement-learning
1890
+ name: Reinforcement Learning
1891
+ dataset:
1892
+ name: Faucet Open
1893
+ type: metaworld-faucet-open
1894
+ metrics:
1895
+ - type: total_reward
1896
+ value: 637.86 +/- 134.50
1897
+ name: Total reward
1898
+ - type: expert_normalized_total_reward
1899
+ value: 0.85 +/- 0.29
1900
+ name: Expert normalized total reward
1901
+ - task:
1902
+ type: reinforcement-learning
1903
+ name: Reinforcement Learning
1904
+ dataset:
1905
+ name: Hammer
1906
+ type: metaworld-hammer
1907
+ metrics:
1908
+ - type: total_reward
1909
+ value: 691.72 +/- 25.25
1910
+ name: Total reward
1911
+ - type: expert_normalized_total_reward
1912
+ value: 1.00 +/- 0.04
1913
+ name: Expert normalized total reward
1914
+ - task:
1915
+ type: reinforcement-learning
1916
+ name: Reinforcement Learning
1917
+ dataset:
1918
+ name: Hand Insert
1919
+ type: metaworld-hand-insert
1920
+ metrics:
1921
+ - type: total_reward
1922
+ value: 719.57 +/- 99.26
1923
+ name: Total reward
1924
+ - type: expert_normalized_total_reward
1925
+ value: 0.97 +/- 0.13
1926
+ name: Expert normalized total reward
1927
+ - task:
1928
+ type: reinforcement-learning
1929
+ name: Reinforcement Learning
1930
+ dataset:
1931
+ name: Handle Press Side
1932
+ type: metaworld-handle-press-side
1933
+ metrics:
1934
+ - type: total_reward
1935
+ value: 84.25 +/- 113.34
1936
+ name: Total reward
1937
+ - type: expert_normalized_total_reward
1938
+ value: 0.03 +/- 0.14
1939
+ name: Expert normalized total reward
1940
+ - task:
1941
+ type: reinforcement-learning
1942
+ name: Reinforcement Learning
1943
+ dataset:
1944
+ name: Handle Press
1945
+ type: metaworld-handle-press
1946
+ metrics:
1947
+ - type: total_reward
1948
+ value: 731.94 +/- 261.90
1949
+ name: Total reward
1950
+ - type: expert_normalized_total_reward
1951
+ value: 0.84 +/- 0.34
1952
+ name: Expert normalized total reward
1953
+ - task:
1954
+ type: reinforcement-learning
1955
+ name: Reinforcement Learning
1956
+ dataset:
1957
+ name: Handle Pull Side
1958
+ type: metaworld-handle-pull-side
1959
+ metrics:
1960
+ - type: total_reward
1961
+ value: 233.11 +/- 199.49
1962
+ name: Total reward
1963
+ - type: expert_normalized_total_reward
1964
+ value: 0.60 +/- 0.52
1965
+ name: Expert normalized total reward
1966
+ - task:
1967
+ type: reinforcement-learning
1968
+ name: Reinforcement Learning
1969
+ dataset:
1970
+ name: Handle Pull
1971
+ type: metaworld-handle-pull
1972
+ metrics:
1973
+ - type: total_reward
1974
+ value: 501.29 +/- 209.45
1975
+ name: Total reward
1976
+ - type: expert_normalized_total_reward
1977
+ value: 0.74 +/- 0.32
1978
+ name: Expert normalized total reward
1979
+ - task:
1980
+ type: reinforcement-learning
1981
+ name: Reinforcement Learning
1982
+ dataset:
1983
+ name: Lever Pull
1984
+ type: metaworld-lever-pull
1985
+ metrics:
1986
+ - type: total_reward
1987
+ value: 250.18 +/- 228.59
1988
+ name: Total reward
1989
+ - type: expert_normalized_total_reward
1990
+ value: 0.34 +/- 0.41
1991
+ name: Expert normalized total reward
1992
+ - task:
1993
+ type: reinforcement-learning
1994
+ name: Reinforcement Learning
1995
+ dataset:
1996
+ name: Peg Insert Side
1997
+ type: metaworld-peg-insert-side
1998
+ metrics:
1999
+ - type: total_reward
2000
+ value: 288.02 +/- 157.87
2001
+ name: Total reward
2002
+ - type: expert_normalized_total_reward
2003
+ value: 0.91 +/- 0.50
2004
+ name: Expert normalized total reward
2005
+ - task:
2006
+ type: reinforcement-learning
2007
+ name: Reinforcement Learning
2008
+ dataset:
2009
+ name: Peg Unplug Side
2010
+ type: metaworld-peg-unplug-side
2011
+ metrics:
2012
+ - type: total_reward
2013
+ value: 68.48 +/- 125.34
2014
+ name: Total reward
2015
+ - type: expert_normalized_total_reward
2016
+ value: 0.14 +/- 0.28
2017
+ name: Expert normalized total reward
2018
+ - task:
2019
+ type: reinforcement-learning
2020
+ name: Reinforcement Learning
2021
+ dataset:
2022
+ name: Pick Out Of Hole
2023
+ type: metaworld-pick-out-of-hole
2024
+ metrics:
2025
+ - type: total_reward
2026
+ value: 2.08 +/- 0.05
2027
+ name: Total reward
2028
+ - type: expert_normalized_total_reward
2029
+ value: 0.00 +/- 0.00
2030
+ name: Expert normalized total reward
2031
+ - task:
2032
+ type: reinforcement-learning
2033
+ name: Reinforcement Learning
2034
+ dataset:
2035
+ name: Pick Place Wall
2036
+ type: metaworld-pick-place-wall
2037
+ metrics:
2038
+ - type: total_reward
2039
+ value: 6.87 +/- 44.99
2040
+ name: Total reward
2041
+ - type: expert_normalized_total_reward
2042
+ value: 0.02 +/- 0.10
2043
+ name: Expert normalized total reward
2044
+ - task:
2045
+ type: reinforcement-learning
2046
+ name: Reinforcement Learning
2047
+ dataset:
2048
+ name: Pick Place
2049
+ type: metaworld-pick-place
2050
+ metrics:
2051
+ - type: total_reward
2052
+ value: 264.18 +/- 195.69
2053
+ name: Total reward
2054
+ - type: expert_normalized_total_reward
2055
+ value: 0.63 +/- 0.47
2056
+ name: Expert normalized total reward
2057
+ - task:
2058
+ type: reinforcement-learning
2059
+ name: Reinforcement Learning
2060
+ dataset:
2061
+ name: Plate Slide Back Side
2062
+ type: metaworld-plate-slide-back-side
2063
+ metrics:
2064
+ - type: total_reward
2065
+ value: 697.54 +/- 137.79
2066
+ name: Total reward
2067
+ - type: expert_normalized_total_reward
2068
+ value: 0.95 +/- 0.20
2069
+ name: Expert normalized total reward
2070
+ - task:
2071
+ type: reinforcement-learning
2072
+ name: Reinforcement Learning
2073
+ dataset:
2074
+ name: Plate Slide Back
2075
+ type: metaworld-plate-slide-back
2076
+ metrics:
2077
+ - type: total_reward
2078
+ value: 196.80 +/- 1.73
2079
+ name: Total reward
2080
+ - type: expert_normalized_total_reward
2081
+ value: 0.24 +/- 0.00
2082
+ name: Expert normalized total reward
2083
+ - task:
2084
+ type: reinforcement-learning
2085
+ name: Reinforcement Learning
2086
+ dataset:
2087
+ name: Plate Slide Side
2088
+ type: metaworld-plate-slide-side
2089
+ metrics:
2090
+ - type: total_reward
2091
+ value: 122.61 +/- 24.52
2092
+ name: Total reward
2093
+ - type: expert_normalized_total_reward
2094
+ value: 0.16 +/- 0.04
2095
+ name: Expert normalized total reward
2096
+ - task:
2097
+ type: reinforcement-learning
2098
+ name: Reinforcement Learning
2099
+ dataset:
2100
+ name: Plate Slide
2101
+ type: metaworld-plate-slide
2102
+ metrics:
2103
+ - type: total_reward
2104
+ value: 497.42 +/- 168.74
2105
+ name: Total reward
2106
+ - type: expert_normalized_total_reward
2107
+ value: 0.93 +/- 0.37
2108
+ name: Expert normalized total reward
2109
+ - task:
2110
+ type: reinforcement-learning
2111
+ name: Reinforcement Learning
2112
+ dataset:
2113
+ name: Push Back
2114
+ type: metaworld-push-back
2115
+ metrics:
2116
+ - type: total_reward
2117
+ value: 91.41 +/- 115.05
2118
+ name: Total reward
2119
+ - type: expert_normalized_total_reward
2120
+ value: 1.08 +/- 1.37
2121
+ name: Expert normalized total reward
2122
+ - task:
2123
+ type: reinforcement-learning
2124
+ name: Reinforcement Learning
2125
+ dataset:
2126
+ name: Push Wall
2127
+ type: metaworld-push-wall
2128
+ metrics:
2129
+ - type: total_reward
2130
+ value: 116.49 +/- 208.05
2131
+ name: Total reward
2132
+ - type: expert_normalized_total_reward
2133
+ value: 0.15 +/- 0.28
2134
+ name: Expert normalized total reward
2135
+ - task:
2136
+ type: reinforcement-learning
2137
+ name: Reinforcement Learning
2138
+ dataset:
2139
+ name: Push
2140
+ type: metaworld-push
2141
+ metrics:
2142
+ - type: total_reward
2143
+ value: 604.25 +/- 261.90
2144
+ name: Total reward
2145
+ - type: expert_normalized_total_reward
2146
+ value: 0.80 +/- 0.35
2147
+ name: Expert normalized total reward
2148
+ - task:
2149
+ type: reinforcement-learning
2150
+ name: Reinforcement Learning
2151
+ dataset:
2152
+ name: Reach Wall
2153
+ type: metaworld-reach-wall
2154
+ metrics:
2155
+ - type: total_reward
2156
+ value: 634.57 +/- 231.40
2157
+ name: Total reward
2158
+ - type: expert_normalized_total_reward
2159
+ value: 0.81 +/- 0.38
2160
+ name: Expert normalized total reward
2161
+ - task:
2162
+ type: reinforcement-learning
2163
+ name: Reinforcement Learning
2164
+ dataset:
2165
+ name: Reach
2166
+ type: metaworld-reach
2167
+ metrics:
2168
+ - type: total_reward
2169
+ value: 325.27 +/- 159.21
2170
+ name: Total reward
2171
+ - type: expert_normalized_total_reward
2172
+ value: 0.33 +/- 0.30
2173
+ name: Expert normalized total reward
2174
+ - task:
2175
+ type: reinforcement-learning
2176
+ name: Reinforcement Learning
2177
+ dataset:
2178
+ name: Shelf Place
2179
+ type: metaworld-shelf-place
2180
+ metrics:
2181
+ - type: total_reward
2182
+ value: 124.60 +/- 112.83
2183
+ name: Total reward
2184
+ - type: expert_normalized_total_reward
2185
+ value: 0.52 +/- 0.47
2186
+ name: Expert normalized total reward
2187
+ - task:
2188
+ type: reinforcement-learning
2189
+ name: Reinforcement Learning
2190
+ dataset:
2191
+ name: Soccer
2192
+ type: metaworld-soccer
2193
+ metrics:
2194
+ - type: total_reward
2195
+ value: 364.50 +/- 175.45
2196
+ name: Total reward
2197
+ - type: expert_normalized_total_reward
2198
+ value: 0.97 +/- 0.47
2199
+ name: Expert normalized total reward
2200
+ - task:
2201
+ type: reinforcement-learning
2202
+ name: Reinforcement Learning
2203
+ dataset:
2204
+ name: Stick Pull
2205
+ type: metaworld-stick-pull
2206
+ metrics:
2207
+ - type: total_reward
2208
+ value: 398.64 +/- 205.60
2209
+ name: Total reward
2210
+ - type: expert_normalized_total_reward
2211
+ value: 0.76 +/- 0.39
2212
+ name: Expert normalized total reward
2213
+ - task:
2214
+ type: reinforcement-learning
2215
+ name: Reinforcement Learning
2216
+ dataset:
2217
+ name: Stick Push
2218
+ type: metaworld-stick-push
2219
+ metrics:
2220
+ - type: total_reward
2221
+ value: 158.29 +/- 264.59
2222
+ name: Total reward
2223
+ - type: expert_normalized_total_reward
2224
+ value: 0.25 +/- 0.42
2225
+ name: Expert normalized total reward
2226
+ - task:
2227
+ type: reinforcement-learning
2228
+ name: Reinforcement Learning
2229
+ dataset:
2230
+ name: Sweep Into
2231
+ type: metaworld-sweep-into
2232
+ metrics:
2233
+ - type: total_reward
2234
+ value: 775.30 +/- 119.00
2235
+ name: Total reward
2236
+ - type: expert_normalized_total_reward
2237
+ value: 0.97 +/- 0.15
2238
+ name: Expert normalized total reward
2239
+ - task:
2240
+ type: reinforcement-learning
2241
+ name: Reinforcement Learning
2242
+ dataset:
2243
+ name: Sweep
2244
+ type: metaworld-sweep
2245
+ metrics:
2246
+ - type: total_reward
2247
+ value: 15.64 +/- 9.29
2248
+ name: Total reward
2249
+ - type: expert_normalized_total_reward
2250
+ value: 0.01 +/- 0.02
2251
+ name: Expert normalized total reward
2252
+ - task:
2253
+ type: reinforcement-learning
2254
+ name: Reinforcement Learning
2255
+ dataset:
2256
+ name: Window Close
2257
+ type: metaworld-window-close
2258
+ metrics:
2259
+ - type: total_reward
2260
+ value: 423.33 +/- 203.92
2261
+ name: Total reward
2262
+ - type: expert_normalized_total_reward
2263
+ value: 0.69 +/- 0.38
2264
+ name: Expert normalized total reward
2265
+ - task:
2266
+ type: reinforcement-learning
2267
+ name: Reinforcement Learning
2268
+ dataset:
2269
+ name: Window Open
2270
+ type: metaworld-window-open
2271
+ metrics:
2272
+ - type: total_reward
2273
+ value: 593.10 +/- 54.83
2274
+ name: Total reward
2275
+ - type: expert_normalized_total_reward
2276
+ value: 1.00 +/- 0.10
2277
+ name: Expert normalized total reward
2278
+ - task:
2279
+ type: reinforcement-learning
2280
+ name: Reinforcement Learning
2281
+ dataset:
2282
+ name: Ant
2283
+ type: mujoco-ant
2284
+ metrics:
2285
+ - type: total_reward
2286
+ value: 5268.02 +/- 1495.39
2287
+ name: Total reward
2288
+ - type: expert_normalized_total_reward
2289
+ value: 0.90 +/- 0.25
2290
+ name: Expert normalized total reward
2291
+ - task:
2292
+ type: reinforcement-learning
2293
+ name: Reinforcement Learning
2294
+ dataset:
2295
+ name: Inverted Double Pendulum
2296
+ type: mujoco-doublependulum
2297
+ metrics:
2298
+ - type: total_reward
2299
+ value: 4750.14 +/- 931.20
2300
+ name: Total reward
2301
+ - type: expert_normalized_total_reward
2302
+ value: 0.51 +/- 0.10
2303
+ name: Expert normalized total reward
2304
+ - task:
2305
+ type: reinforcement-learning
2306
+ name: Reinforcement Learning
2307
+ dataset:
2308
+ name: Half Cheetah
2309
+ type: mujoco-halfcheetah
2310
+ metrics:
2311
+ - type: total_reward
2312
+ value: 6659.69 +/- 409.71
2313
+ name: Total reward
2314
+ - type: expert_normalized_total_reward
2315
+ value: 0.90 +/- 0.05
2316
+ name: Expert normalized total reward
2317
+ - task:
2318
+ type: reinforcement-learning
2319
+ name: Reinforcement Learning
2320
+ dataset:
2321
+ name: Hopper
2322
+ type: mujoco-hopper
2323
+ metrics:
2324
+ - type: total_reward
2325
+ value: 1835.93 +/- 532.21
2326
+ name: Total reward
2327
+ - type: expert_normalized_total_reward
2328
+ value: 0.99 +/- 0.29
2329
+ name: Expert normalized total reward
2330
+ - task:
2331
+ type: reinforcement-learning
2332
+ name: Reinforcement Learning
2333
+ dataset:
2334
+ name: Humanoid
2335
+ type: mujoco-humanoid
2336
+ metrics:
2337
+ - type: total_reward
2338
+ value: 697.44 +/- 108.06
2339
+ name: Total reward
2340
+ - type: expert_normalized_total_reward
2341
+ value: 0.09 +/- 0.02
2342
+ name: Expert normalized total reward
2343
+ - task:
2344
+ type: reinforcement-learning
2345
+ name: Reinforcement Learning
2346
+ dataset:
2347
+ name: Inverted Pendulum
2348
+ type: mujoco-pendulum
2349
+ metrics:
2350
+ - type: total_reward
2351
+ value: 116.34 +/- 20.19
2352
+ name: Total reward
2353
+ - type: expert_normalized_total_reward
2354
+ value: 0.23 +/- 0.04
2355
+ name: Expert normalized total reward
2356
+ - task:
2357
+ type: reinforcement-learning
2358
+ name: Reinforcement Learning
2359
+ dataset:
2360
+ name: Pusher
2361
+ type: mujoco-pusher
2362
+ metrics:
2363
+ - type: total_reward
2364
+ value: -26.33 +/- 6.32
2365
+ name: Total reward
2366
+ - type: expert_normalized_total_reward
2367
+ value: 0.99 +/- 0.05
2368
+ name: Expert normalized total reward
2369
+ - task:
2370
+ type: reinforcement-learning
2371
+ name: Reinforcement Learning
2372
+ dataset:
2373
+ name: Reacher
2374
+ type: mujoco-reacher
2375
+ metrics:
2376
+ - type: total_reward
2377
+ value: -6.06 +/- 2.64
2378
+ name: Total reward
2379
+ - type: expert_normalized_total_reward
2380
+ value: 0.99 +/- 0.07
2381
+ name: Expert normalized total reward
2382
+ - task:
2383
+ type: reinforcement-learning
2384
+ name: Reinforcement Learning
2385
+ dataset:
2386
+ name: Humanoid Standup
2387
+ type: mujoco-standup
2388
+ metrics:
2389
+ - type: total_reward
2390
+ value: 118125.15 +/- 24880.28
2391
+ name: Total reward
2392
+ - type: expert_normalized_total_reward
2393
+ value: 0.35 +/- 0.10
2394
+ name: Expert normalized total reward
2395
+ - task:
2396
+ type: reinforcement-learning
2397
+ name: Reinforcement Learning
2398
+ dataset:
2399
+ name: Swimmer
2400
+ type: mujoco-swimmer
2401
+ metrics:
2402
+ - type: total_reward
2403
+ value: 93.26 +/- 3.78
2404
+ name: Total reward
2405
+ - type: expert_normalized_total_reward
2406
+ value: 1.01 +/- 0.04
2407
+ name: Expert normalized total reward
2408
+ - task:
2409
+ type: reinforcement-learning
2410
+ name: Reinforcement Learning
2411
+ dataset:
2412
+ name: Walker 2d
2413
+ type: mujoco-walker
2414
+ metrics:
2415
+ - type: total_reward
2416
+ value: 4662.43 +/- 762.67
2417
+ name: Total reward
2418
+ - type: expert_normalized_total_reward
2419
+ value: 1.01 +/- 0.16
2420
+ name: Expert normalized total reward
2421
+ ---
2422
+
2423
+ # Model Card for Jat
2424
+
2425
+ This is a multi-modal and multi-task model.
2426
+
2427
+ ## Model Details
2428
+
2429
+ ### Model Description
2430
+
2431
+ - **Developed by:** The JAT Team
2432
+ - **License:** Apache 2.0
2433
+
2434
+ ### Model Sources
2435
+
2436
+ - **Repository:** <https://github.com/huggingface/jat>
2437
+ - **Paper:** Coming soon
2438
+ - **Demo:** Coming soon
2439
+
2440
+ ## Training
2441
+
2442
+ The model was trained on the following tasks:
2443
+
2444
+ - Alien
2445
+ - Amidar
2446
+ - Assault
2447
+ - Asterix
2448
+ - Asteroids
2449
+ - Atlantis
2450
+ - Bank Heist
2451
+ - Battle Zone
2452
+ - Beam Rider
2453
+ - Berzerk
2454
+ - Bowling
2455
+ - Boxing
2456
+ - Breakout
2457
+ - Centipede
2458
+ - Chopper Command
2459
+ - Crazy Climber
2460
+ - Defender
2461
+ - Demon Attack
2462
+ - Double Dunk
2463
+ - Enduro
2464
+ - Fishing Derby
2465
+ - Freeway
2466
+ - Frostbite
2467
+ - Gopher
2468
+ - Gravitar
2469
+ - H.E.R.O.
2470
+ - Ice Hockey
2471
+ - James Bond
2472
+ - Kangaroo
2473
+ - Krull
2474
+ - Kung-Fu Master
2475
+ - Montezuma's Revenge
2476
+ - Ms. Pacman
2477
+ - Name This Game
2478
+ - Phoenix
2479
+ - PitFall
2480
+ - Pong
2481
+ - Private Eye
2482
+ - Q*Bert
2483
+ - River Raid
2484
+ - Road Runner
2485
+ - Robotank
2486
+ - Seaquest
2487
+ - Skiing
2488
+ - Solaris
2489
+ - Space Invaders
2490
+ - Star Gunner
2491
+ - Surround
2492
+ - Tennis
2493
+ - Time Pilot
2494
+ - Tutankham
2495
+ - Up and Down
2496
+ - Venture
2497
+ - Video Pinball
2498
+ - Wizard of Wor
2499
+ - Yars Revenge
2500
+ - Zaxxon
2501
+ - Action Obj Door
2502
+ - Blocked Unlock Pickup
2503
+ - Boss Level No Unlock
2504
+ - Boss Level
2505
+ - Find Obj S5
2506
+ - Go To Door
2507
+ - Go To Imp Unlock
2508
+ - Go To Local
2509
+ - Go To Obj Door
2510
+ - Go To Obj
2511
+ - Go To Red Ball Grey
2512
+ - Go To Red Ball No Dists
2513
+ - Go To Red Ball
2514
+ - Go To Red Blue Ball
2515
+ - Go To Seq
2516
+ - Go To
2517
+ - Key Corridor
2518
+ - Mini Boss Level
2519
+ - Move Two Across S8N9
2520
+ - One Room S8
2521
+ - Open Door
2522
+ - Open Doors Order N4
2523
+ - Open Red Door
2524
+ - Open Two Doors
2525
+ - Open
2526
+ - Pickup Above
2527
+ - Pickup Dist
2528
+ - Pickup Loc
2529
+ - Pickup
2530
+ - Put Next Local
2531
+ - Put Next S7N4
2532
+ - Synth Loc
2533
+ - Synth Seq
2534
+ - Synth
2535
+ - Unblock Pickup
2536
+ - Unlock Local
2537
+ - Unlock Pickup
2538
+ - Unlock To Unlock
2539
+ - Unlock
2540
+ - Assembly
2541
+ - Basketball
2542
+ - BinPicking
2543
+ - Box Close
2544
+ - Button Press Topdown Wall
2545
+ - Button Press Topdown
2546
+ - Button Press Wall
2547
+ - Button Press
2548
+ - Coffee Button
2549
+ - Coffee Pull
2550
+ - Coffee Push
2551
+ - Dial Turn
2552
+ - Disassemble
2553
+ - Door Close
2554
+ - Door Lock
2555
+ - Door Open
2556
+ - Door Unlock
2557
+ - Drawer Close
2558
+ - Drawer Open
2559
+ - Faucet Close
2560
+ - Faucet Open
2561
+ - Hammer
2562
+ - Hand Insert
2563
+ - Handle Press Side
2564
+ - Handle Press
2565
+ - Handle Pull Side
2566
+ - Handle Pull
2567
+ - Lever Pull
2568
+ - Peg Insert Side
2569
+ - Peg Unplug Side
2570
+ - Pick Out Of Hole
2571
+ - Pick Place Wall
2572
+ - Pick Place
2573
+ - Plate Slide Back Side
2574
+ - Plate Slide Back
2575
+ - Plate Slide Side
2576
+ - Plate Slide
2577
+ - Push Back
2578
+ - Push Wall
2579
+ - Push
2580
+ - Reach Wall
2581
+ - Reach
2582
+ - Shelf Place
2583
+ - Soccer
2584
+ - Stick Pull
2585
+ - Stick Push
2586
+ - Sweep Into
2587
+ - Sweep
2588
+ - Window Close
2589
+ - Window Open
2590
+ - Ant
2591
+ - Inverted Double Pendulum
2592
+ - Half Cheetah
2593
+ - Hopper
2594
+ - Humanoid
2595
+ - Inverted Pendulum
2596
+ - Pusher
2597
+ - Reacher
2598
+ - Humanoid Standup
2599
+ - Swimmer
2600
+ - Walker 2d
2601
+
2602
+ ## How to Get Started with the Model
2603
+
2604
+ Use the code below to get started with the model.
2605
+
2606
+ ```python
2607
+ from transformers import AutoModelForCausalLM
2608
+
2609
+ model = AutoModelForCausalLM.from_pretrained("jat-project/jat")
2610
+ ```