MoDE_CALVIN_ABC_1 / calvin_eval.log
mbreuss's picture
Upload calvin_eval.log
9de3252 verified
raw
history blame
3.34 kB
[2024-12-16 09:48:13,763][mode.evaluation.multistep_sequences][INFO] - Start generating evaluation sequences.
[2024-12-16 09:48:25,484][mode.evaluation.multistep_sequences][INFO] - Done generating evaluation sequences.
0%| | 0/1000 [00:00<?, ?it/s]
1/5 : 96.6% | 2/5 : 90.3% | 3/5 : 82.8% | 4/5 : 72.2% | 5/5 : 63.5% | Average: 4.1 |: 100%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ| 1000/1000 [2:24:10<00:00, 8.65s/it]
Results for Epoch 0:
Average successful sequence length: 4.054
Success rates for i instructions in a row:
1: 96.6%
2: 90.3%
3: 82.8%
4: 72.2%
5: 63.5%
rotate_blue_block_right: 73 / 77 | SR: 94.8%
move_slider_right: 279 / 279 | SR: 100.0%
lift_red_block_slider: 103 / 124 | SR: 83.1%
place_in_slider: 262 / 349 | SR: 75.1%
turn_off_lightbulb: 137 / 144 | SR: 95.1%
turn_off_led: 165 / 166 | SR: 99.4%
push_into_drawer: 95 / 121 | SR: 78.5%
lift_blue_block_drawer: 20 / 20 | SR: 100.0%
lift_pink_block_slider: 127 / 138 | SR: 92.0%
open_drawer: 336 / 338 | SR: 99.4%
rotate_red_block_right: 71 / 73 | SR: 97.3%
lift_red_block_table: 185 / 186 | SR: 99.5%
lift_pink_block_table: 164 / 174 | SR: 94.3%
turn_on_lightbulb: 175 / 179 | SR: 97.8%
rotate_blue_block_left: 65 / 65 | SR: 100.0%
push_blue_block_left: 61 / 66 | SR: 92.4%
close_drawer: 202 / 202 | SR: 100.0%
turn_on_led: 168 / 171 | SR: 98.2%
push_pink_block_right: 32 / 61 | SR: 52.5%
push_red_block_right: 54 / 70 | SR: 77.1%
move_slider_left: 239 / 242 | SR: 98.8%
push_red_block_left: 72 / 77 | SR: 93.5%
lift_blue_block_table: 184 / 185 | SR: 99.5%
place_in_drawer: 176 / 176 | SR: 100.0%
rotate_red_block_left: 59 / 60 | SR: 98.3%
stack_block: 126 / 197 | SR: 64.0%
push_pink_block_left: 72 / 72 | SR: 100.0%
lift_blue_block_slider: 103 / 129 | SR: 79.8%
lift_pink_block_drawer: 11 / 13 | SR: 84.6%
rotate_pink_block_right: 63 / 67 | SR: 94.0%
unstack_block: 54 / 56 | SR: 96.4%
rotate_pink_block_left: 53 / 55 | SR: 96.4%
push_blue_block_right: 49 / 68 | SR: 72.1%
lift_red_block_drawer: 19 / 19 | SR: 100.0%
Best model: epoch 0 with average sequences length of 4.054