| language_model.model.layers.0 4 | |
| language_model.model.layers.1 4 | |
| language_model.model.layers.2 4 | |
| language_model.model.layers.3 4 | |
| language_model.model.layers.4 4 | |
| language_model.model.layers.5 4 | |
| language_model.model.layers.6 4 | |
| language_model.model.layers.7 4 | |
| language_model.model.layers.8 4 | |
| language_model.model.layers.9 4 | |
| language_model.model.layers.10 4 | |
| language_model.model.layers.11 4 | |
| language_model.model.layers.12 4 | |
| language_model.model.layers.13 4 | |
| language_model.model.layers.14 4 | |
| language_model.model.layers.15 4 | |
| language_model.model.layers.16 4 | |
| language_model.model.layers.17 4 | |
| language_model.model.layers.18 4 | |
| language_model.model.layers.19 4 | |
| language_model.model.layers.20 4 | |
| language_model.model.layers.21 4 | |
| language_model.model.layers.22 4 | |
| language_model.model.layers.23 4 | |
| vision_model.encoder.layers.0 0 | |
| vision_model.encoder.layers.1 0 | |
| vision_model.encoder.layers.2 0 | |
| vision_model.encoder.layers.3 0 | |
| vision_model.encoder.layers.4 0 | |
| vision_model.encoder.layers.5 0 | |
| vision_model.encoder.layers.6 0 | |
| vision_model.encoder.layers.7 0 | |
| vision_model.encoder.layers.8 0 | |
| vision_model.encoder.layers.9 0 | |
| vision_model.encoder.layers.10 0 | |
| vision_model.encoder.layers.11 0 | |
| vision_model.encoder.layers.12 0 | |
| vision_model.encoder.layers.13 0 | |
| vision_model.encoder.layers.14 0 | |
| vision_model.encoder.layers.15 0 | |
| vision_model.encoder.layers.16 0 | |
| vision_model.encoder.layers.17 0 | |
| vision_model.encoder.layers.18 0 | |
| vision_model.encoder.layers.19 0 | |
| vision_model.encoder.layers.20 0 | |
| vision_model.encoder.layers.21 0 | |
| vision_model.encoder.layers.22 0 | |
| vision_model.encoder.layers.23 0 | |
| vision_model.embeddings 0 | |
| mlp1 0 | |
| language_model.model.tok_embeddings 4 | |
| language_model.model.norm 4 | |
| language_model.output 4 | |
| language_model.model.embed_tokens 4 | |
| language_model.lm_head 4 | |
| The argument `trust_remote_code` is to be used with Auto classes. It has no effect here and is ignored. | |
| The argument `trust_remote_code` is to be used with Auto classes. It has no effect here and is ignored. | |
| The argument `trust_remote_code` is to be used with Auto classes. It has no effect here and is ignored. | |
| The argument `trust_remote_code` is to be used with Auto classes. It has no effect here and is ignored. | |
| Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained. | |
| Rank [3] Begin to eval model work_dirs/share_internvl/InternVL2-2B on task ActionPrediction, devices: {device(type='cuda', index=3), device(type='cuda', index=7)} | |
| Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained. | |
| Rank [0] Begin to eval model work_dirs/share_internvl/InternVL2-2B on task ActionPrediction, devices: {device(type='cuda', index=0), device(type='cuda', index=4)} | |
| Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained. | |
| Rank [2] Begin to eval model work_dirs/share_internvl/InternVL2-2B on task ActionPrediction, devices: {device(type='cuda', index=2), device(type='cuda', index=6)} | |
| Initialization Finished | |
| Predicting ActionPrediction Using internvl | |
| Proceeding 36-length images samples | Num: 5 | |
| Initialization Finished | |
| Predicting ActionPrediction Using internvl | |
| Proceeding 36-length images samples | Num: 5 | |
| Initialization Finished | |
| Predicting ActionPrediction Using internvl | |
| Proceeding 36-length images samples | Num: 5 | |
| Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained. | |
| Rank [1] Begin to eval model work_dirs/share_internvl/InternVL2-2B on task ActionPrediction, devices: {device(type='cuda', index=1), device(type='cuda', index=5)} | |
| Initialization Finished | |
| Predicting ActionPrediction Using internvl | |
| Proceeding 36-length images samples | Num: 5 | |
| 0%| | 0/1 [00:00<?, ?it/s] 100%|ββββββββββ| 1/1 [00:06<00:00, 6.17s/it] 100%|ββββββββββ| 1/1 [00:06<00:00, 6.31s/it] | |
| Proceeding 28-length images samples | Num: 9 | |
| 0%| | 0/1 [00:00<?, ?it/s] 100%|ββββββββββ| 1/1 [00:06<00:00, 6.15s/it] 100%|ββββββββββ| 1/1 [00:06<00:00, 6.29s/it] | |
| Proceeding 28-length images samples | Num: 9 | |
| 0%| | 0/1 [00:00<?, ?it/s] 100%|ββββββββββ| 1/1 [00:06<00:00, 6.18s/it] 100%|ββββββββββ| 1/1 [00:06<00:00, 6.31s/it] | |
| Proceeding 28-length images samples | Num: 9 | |
| Proceeding 28-length images samples | Num: 9 | |
| 0%| | 0/2 [00:00<?, ?it/s] 50%|βββββ | 1/2 [00:03<00:03, 3.09s/it] 100%|ββββββββββ| 2/2 [00:04<00:00, 2.25s/it] 100%|ββββββββββ| 2/2 [00:04<00:00, 2.42s/it] | |
| Proceeding 29-length images samples | Num: 13 | |
| 0%| | 0/2 [00:00<?, ?it/s] 50%|βββββ | 1/2 [00:03<00:03, 3.25s/it] 100%|ββββββββββ| 2/2 [00:05<00:00, 2.60s/it] 100%|ββββββββββ| 2/2 [00:05<00:00, 2.75s/it] | |
| Proceeding 29-length images samples | Num: 13 | |
| 0%| | 0/2 [00:00<?, ?it/s] 50%|βββββ | 1/2 [00:03<00:03, 3.30s/it] 100%|ββββββββββ| 2/2 [00:05<00:00, 2.66s/it] 100%|ββββββββββ| 2/2 [00:05<00:00, 2.79s/it] | |
| Proceeding 29-length images samples | Num: 13 | |
| Proceeding 29-length images samples | Num: 13 | |
| 0%| | 0/3 [00:00<?, ?it/s] 33%|ββββ | 1/3 [00:01<00:02, 1.02s/it] 67%|βββββββ | 2/3 [00:02<00:01, 1.46s/it] 100%|ββββββββββ| 3/3 [00:03<00:00, 1.17s/it] 100%|ββββββββββ| 3/3 [00:03<00:00, 1.24s/it] | |
| Proceeding 32-length images samples | Num: 15 | |
| 0%| | 0/3 [00:00<?, ?it/s] 33%|ββββ | 1/3 [00:02<00:04, 2.05s/it] 67%|βββββββ | 2/3 [00:02<00:01, 1.32s/it] 100%|ββββββββββ| 3/3 [00:03<00:00, 1.08s/it] 100%|ββββββββββ| 3/3 [00:03<00:00, 1.24s/it] | |
| Proceeding 32-length images samples | Num: 15 | |
| 0%| | 0/3 [00:00<?, ?it/s] 33%|ββββ | 1/3 [00:02<00:04, 2.42s/it] 67%|βββββββ | 2/3 [00:03<00:01, 1.50s/it] 100%|ββββββββββ| 3/3 [00:06<00:00, 2.34s/it] 100%|ββββββββββ| 3/3 [00:06<00:00, 2.23s/it] | |
| Proceeding 32-length images samples | Num: 15 | |
| Proceeding 32-length images samples | Num: 15 | |
| 0%| | 0/3 [00:00<?, ?it/s] 33%|ββββ | 1/3 [00:03<00:06, 3.17s/it] 67%|βββββββ | 2/3 [00:05<00:02, 2.79s/it] 100%|ββββββββββ| 3/3 [00:07<00:00, 2.50s/it] 100%|ββββββββββ| 3/3 [00:07<00:00, 2.65s/it] | |
| Proceeding 31-length images samples | Num: 26 | |
| 0%| | 0/4 [00:00<?, ?it/s] 25%|βββ | 1/4 [00:03<00:10, 3.63s/it] 50%|βββββ | 2/4 [00:06<00:05, 2.99s/it] 75%|ββββββββ | 3/4 [00:06<00:01, 1.99s/it] 100%|ββββββββββ| 4/4 [00:09<00:00, 2.17s/it] 100%|ββββββββββ| 4/4 [00:09<00:00, 2.37s/it] | |
| Proceeding 31-length images samples | Num: 26 | |
| 0%| | 0/4 [00:00<?, ?it/s] 25%|βββ | 1/4 [00:02<00:08, 2.83s/it] 50%|βββββ | 2/4 [00:05<00:05, 2.80s/it] 75%|ββββββββ | 3/4 [00:06<00:01, 1.96s/it] 100%|ββββββββββ| 4/4 [00:08<00:00, 1.78s/it] 100%|ββββββββββ| 4/4 [00:08<00:00, 2.04s/it] | |
| Proceeding 31-length images samples | Num: 26 | |
| Proceeding 31-length images samples | Num: 26 | |
| 0%| | 0/6 [00:00<?, ?it/s] 17%|ββ | 1/6 [00:01<00:06, 1.20s/it] 33%|ββββ | 2/6 [00:02<00:05, 1.39s/it] 50%|βββββ | 3/6 [00:03<00:03, 1.12s/it] 67%|βββββββ | 4/6 [00:04<00:02, 1.19s/it] 83%|βββββββββ | 5/6 [00:06<00:01, 1.39s/it] 100%|ββββββββββ| 6/6 [00:09<00:00, 2.00s/it] 100%|ββββββββββ| 6/6 [00:09<00:00, 1.64s/it] | |
| Proceeding 34-length images samples | Num: 9 | |
| 0%| | 0/6 [00:00<?, ?it/s] 17%|ββ | 1/6 [00:02<00:10, 2.03s/it] 33%|ββββ | 2/6 [00:02<00:05, 1.34s/it] 50%|βββββ | 3/6 [00:05<00:05, 1.92s/it] 67%|βββββββ | 4/6 [00:06<00:03, 1.58s/it] 83%|βββββββββ | 5/6 [00:09<00:01, 1.90s/it] 100%|ββββββββββ| 6/6 [00:11<00:00, 2.20s/it] 100%|ββββββββββ| 6/6 [00:11<00:00, 1.98s/it] | |
| Proceeding 34-length images samples | Num: 9 | |
| Proceeding 37-length images samples | Num: 6 | |
| 0%| | 0/2 [00:00<?, ?it/s] 50%|βββββ | 1/2 [00:03<00:03, 3.00s/it] 100%|ββββββββββ| 2/2 [00:04<00:00, 2.30s/it] 100%|ββββββββββ| 2/2 [00:04<00:00, 2.45s/it] | |
| 0%| | 0/1 [00:00<?, ?it/s] 100%|ββββββββββ| 1/1 [00:02<00:00, 2.33s/it] 100%|ββββββββββ| 1/1 [00:02<00:00, 2.40s/it] | |
| Proceeding 44-length images samples | Num: 2 | |
| 0%| | 0/7 [00:00<?, ?it/s] 14%|ββ | 1/7 [00:02<00:17, 2.92s/it] 29%|βββ | 2/7 [00:03<00:08, 1.63s/it] 43%|βββββ | 3/7 [00:06<00:08, 2.21s/it] 57%|ββββββ | 4/7 [00:09<00:07, 2.45s/it] 71%|ββββββββ | 5/7 [00:10<00:03, 1.84s/it] 86%|βββββββββ | 6/7 [00:13<00:02, 2.25s/it] 100%|ββββββββββ| 7/7 [00:14<00:00, 1.80s/it] 100%|ββββββββββ| 7/7 [00:14<00:00, 2.02s/it] | |
| Proceeding 34-length images samples | Num: 9 | |
| 0it [00:00, ?it/s] 0it [00:00, ?it/s] | |
| Proceeding 23-length images samples | Num: 6 | |
| 0%| | 0/2 [00:00<?, ?it/s] 50%|βββββ | 1/2 [00:03<00:03, 3.73s/it] 100%|ββββββββββ| 2/2 [00:04<00:00, 2.09s/it] 100%|ββββββββββ| 2/2 [00:04<00:00, 2.38s/it] | |
| Proceeding 37-length images samples | Num: 6 | |
| Proceeding 34-length images samples | Num: 9 | |
| 0%| | 0/1 [00:00<?, ?it/s] 100%|ββββββββββ| 1/1 [00:03<00:00, 3.56s/it] 100%|ββββββββββ| 1/1 [00:03<00:00, 3.64s/it] | |
| Proceeding 56-length images samples | Num: 1 | |
| 0it [00:00, ?it/s] 0it [00:00, ?it/s] | |
| Proceeding 17-length images samples | Num: 3 | |
| Proceeding 33-length images samples | Num: 8 | |
| 0it [00:00, ?it/s] 0it [00:00, ?it/s] | |
| 0%| | 0/1 [00:00<?, ?it/s] 100%|ββββββββββ| 1/1 [00:04<00:00, 4.19s/it] 100%|ββββββββββ| 1/1 [00:04<00:00, 4.28s/it] | |
| Proceeding 44-length images samples | Num: 2 | |
| 0it [00:00, ?it/s] 0it [00:00, ?it/s] | |
| Proceeding 23-length images samples | Num: 6 | |
| 0%| | 0/2 [00:00<?, ?it/s] 50%|βββββ | 1/2 [00:04<00:04, 4.15s/it] 100%|ββββββββββ| 2/2 [00:05<00:00, 2.29s/it] 100%|ββββββββββ| 2/2 [00:05<00:00, 2.62s/it] | |
| Proceeding 37-length images samples | Num: 6 | |
| 0%| | 0/1 [00:00<?, ?it/s] 100%|ββββββββββ| 1/1 [00:01<00:00, 1.79s/it] 100%|ββββββββββ| 1/1 [00:01<00:00, 1.89s/it] | |
| Proceeding 56-length images samples | Num: 1 | |
| 0it [00:00, ?it/s] 0it [00:00, ?it/s] | |
| Proceeding 17-length images samples | Num: 3 | |
| Proceeding 37-length images samples | Num: 6 | |
| 0%| | 0/2 [00:00<?, ?it/s] 50%|βββββ | 1/2 [00:01<00:01, 1.37s/it] 100%|ββββββββββ| 2/2 [00:03<00:00, 1.54s/it] 100%|ββββββββββ| 2/2 [00:03<00:00, 1.56s/it] | |
| Proceeding 24-length images samples | Num: 3 | |
| 0it [00:00, ?it/s] 0it [00:00, ?it/s] | |
| Proceeding 35-length images samples | Num: 5 | |
| 0%| | 0/1 [00:00<?, ?it/s] 100%|ββββββββββ| 1/1 [00:00<00:00, 1.18it/s] 100%|ββββββββββ| 1/1 [00:00<00:00, 1.08it/s] | |
| Proceeding 33-length images samples | Num: 8 | |
| 0%| | 0/2 [00:00<?, ?it/s] 50%|βββββ | 1/2 [00:02<00:02, 2.49s/it] 100%|ββββββββββ| 2/2 [00:03<00:00, 1.88s/it] 100%|ββββββββββ| 2/2 [00:04<00:00, 2.02s/it] | |
| Proceeding 44-length images samples | Num: 2 | |
| 0%| | 0/1 [00:00<?, ?it/s] 100%|ββββββββββ| 1/1 [00:02<00:00, 2.11s/it] 100%|ββββββββββ| 1/1 [00:02<00:00, 2.21s/it] | |
| Proceeding 30-length images samples | Num: 46 | |
| Proceeding 44-length images samples | Num: 2 | |
| 0%| | 0/1 [00:00<?, ?it/s] 100%|ββββββββββ| 1/1 [00:01<00:00, 1.41s/it] 100%|ββββββββββ| 1/1 [00:01<00:00, 1.51s/it] | |
| Proceeding 23-length images samples | Num: 6 | |
| 0%| | 0/2 [00:00<?, ?it/s] 50%|βββββ | 1/2 [00:02<00:02, 2.06s/it] 100%|ββββββββββ| 2/2 [00:02<00:00, 1.34s/it] 100%|ββββββββββ| 2/2 [00:02<00:00, 1.50s/it] | |
| Proceeding 24-length images samples | Num: 3 | |
| Proceeding 23-length images samples | Num: 6 | |
| 0%| | 0/1 [00:00<?, ?it/s] 100%|ββββββββββ| 1/1 [00:01<00:00, 1.78s/it] 100%|ββββββββββ| 1/1 [00:01<00:00, 1.87s/it] | |
| Proceeding 35-length images samples | Num: 5 | |
| 0%| | 0/2 [00:00<?, ?it/s] 50%|βββββ | 1/2 [00:01<00:01, 1.72s/it] 100%|ββββββββββ| 2/2 [00:02<00:00, 1.11s/it] 100%|ββββββββββ| 2/2 [00:02<00:00, 1.25s/it] | |
| Proceeding 56-length images samples | Num: 1 | |
| 0it [00:00, ?it/s] 0it [00:00, ?it/s] | |
| Proceeding 17-length images samples | Num: 3 | |
| Proceeding 56-length images samples | Num: 1 | |
| 0%| | 0/1 [00:00<?, ?it/s] 100%|ββββββββββ| 1/1 [00:01<00:00, 1.37s/it] 100%|ββββββββββ| 1/1 [00:01<00:00, 1.44s/it] | |
| Proceeding 30-length images samples | Num: 46 | |
| 0%| | 0/1 [00:00<?, ?it/s] 100%|ββββββββββ| 1/1 [00:00<00:00, 1.40it/s] 100%|ββββββββββ| 1/1 [00:00<00:00, 1.25it/s] | |
| Proceeding 33-length images samples | Num: 8 | |
| Proceeding 17-length images samples | Num: 3 | |
| Proceeding 33-length images samples | Num: 8 | |
| 0%| | 0/2 [00:00<?, ?it/s] 50%|βββββ | 1/2 [00:03<00:03, 3.55s/it] 100%|ββββββββββ| 2/2 [00:05<00:00, 2.85s/it] 100%|ββββββββββ| 2/2 [00:06<00:00, 3.00s/it] | |
| Proceeding 24-length images samples | Num: 3 | |
| 0%| | 0/1 [00:00<?, ?it/s] 100%|ββββββββββ| 1/1 [00:03<00:00, 3.26s/it] 100%|ββββββββββ| 1/1 [00:03<00:00, 3.35s/it] | |
| Proceeding 35-length images samples | Num: 5 | |
| Proceeding 24-length images samples | Num: 3 | |
| 0%| | 0/1 [00:00<?, ?it/s] 100%|ββββββββββ| 1/1 [00:04<00:00, 4.12s/it] 100%|ββββββββββ| 1/1 [00:04<00:00, 4.24s/it] | |
| Proceeding 30-length images samples | Num: 46 | |
| 0%| | 0/11 [00:00<?, ?it/s] 9%|β | 1/11 [00:01<00:11, 1.16s/it] 18%|ββ | 2/11 [00:02<00:12, 1.36s/it] 27%|βββ | 3/11 [00:03<00:08, 1.09s/it] 36%|ββββ | 4/11 [00:04<00:06, 1.01it/s] 45%|βββββ | 5/11 [00:06<00:08, 1.46s/it] 55%|ββββββ | 6/11 [00:08<00:07, 1.46s/it] 64%|βββββββ | 7/11 [00:10<00:06, 1.71s/it] 73%|ββββββββ | 8/11 [00:10<00:04, 1.40s/it] 82%|βββββββββ | 9/11 [00:13<00:03, 1.84s/it] 91%|βββββββββ | 10/11 [00:15<00:01, 1.75s/it] 100%|ββββββββββ| 11/11 [00:17<00:00, 1.98s/it] 100%|ββββββββββ| 11/11 [00:17<00:00, 1.63s/it] | |
| Proceeding 21-length images samples | Num: 2 | |
| 0it [00:00, ?it/s] 0it [00:00, ?it/s] | |
| Proceeding 41-length images samples | Num: 1 | |
| Proceeding 59-length images samples | Num: 1 | |
| 0it [00:00, ?it/s] 0it [00:00, ?it/s] | |
| 0it [00:00, ?it/s] 0it [00:00, ?it/s] | |
| Proceeding 22-length images samples | Num: 2 | |
| Proceeding 15-length images samples | Num: 3 | |
| 0it [00:00, ?it/s] 0it [00:00, ?it/s] | |
| 0it [00:00, ?it/s] 0it [00:00, ?it/s] | |
| Proceeding 38-length images samples | Num: 1 | |
| 0it [00:00, ?it/s] 0it [00:00, ?it/s] | |
| Proceeding 25-length images samples | Num: 2 | |
| 0it [00:00, ?it/s] 0it [00:00, ?it/s] | |
| Proceeding 20-length images samples | Num: 1 | |
| Proceeding 35-length images samples | Num: 5 | |
| 0it [00:00, ?it/s] 0it [00:00, ?it/s] | |
| Proceeding 39-length images samples | Num: 1 | |
| Proceeding 16-length images samples | Num: 4 | |
| 0it [00:00, ?it/s] 0it [00:00, ?it/s] | |
| 0%| | 0/1 [00:00<?, ?it/s] 100%|ββββββββββ| 1/1 [00:00<00:00, 1.12it/s] 100%|ββββββββββ| 1/1 [00:00<00:00, 1.03it/s] | |
| Proceeding 43-length images samples | Num: 1 | |
| 0it [00:00, ?it/s] 0it [00:00, ?it/s] | |
| Proceeding 40-length images samples | Num: 2 | |
| Proceeding 27-length images samples | Num: 7 | |
| 0it [00:00, ?it/s] 0it [00:00, ?it/s] | |
| Proceeding 30-length images samples | Num: 46 | |
| 0%| | 0/11 [00:00<?, ?it/s] 9%|β | 1/11 [00:03<00:36, 3.65s/it] 18%|ββ | 2/11 [00:05<00:23, 2.60s/it] 27%|βββ | 3/11 [00:06<00:14, 1.78s/it] 36%|ββββ | 4/11 [00:08<00:13, 1.99s/it] 45%|βββββ | 5/11 [00:09<00:10, 1.70s/it] 55%|ββββββ | 6/11 [00:13<00:11, 2.28s/it] 64%|βββββββ | 7/11 [00:13<00:07, 1.78s/it] 73%|ββββββββ | 8/11 [00:16<00:05, 1.94s/it] 82%|βββββββββ | 9/11 [00:17<00:03, 1.71s/it] 91%|βββββββββ | 10/11 [00:18<00:01, 1.46s/it] 100%|ββββββββββ| 11/11 [00:19<00:00, 1.28s/it] 100%|ββββββββββ| 11/11 [00:19<00:00, 1.76s/it] | |
| Proceeding 21-length images samples | Num: 2 | |
| 0it [00:00, ?it/s] 0it [00:00, ?it/s] | |
| Proceeding 41-length images samples | Num: 1 | |
| 0%| | 0/1 [00:00<?, ?it/s] 100%|ββββββββββ| 1/1 [00:01<00:00, 1.09s/it] 100%|ββββββββββ| 1/1 [00:01<00:00, 1.17s/it] | |
| Proceeding 26-length images samples | Num: 1 | |
| 0it [00:00, ?it/s] 0it [00:00, ?it/s] | |
| Proceeding 59-length images samples | Num: 1 | |
| 0it [00:00, ?it/s] 0it [00:00, ?it/s] | |
| Proceeding 81-length images samples | Num: 1 | |
| 0it [00:00, ?it/s] 0it [00:00, ?it/s] | |
| Proceeding 22-length images samples | Num: 2 | |
| 0it [00:00, ?it/s] 0it [00:00, ?it/s] | |
| Proceeding 12-length images samples | Num: 2 | |
| 0it [00:00, ?it/s] 0it [00:00, ?it/s] | |
| Proceeding 46-length images samples | Num: 1 | |
| 0it [00:00, ?it/s] 0it [00:00, ?it/s] | |
| Proceeding 15-length images samples | Num: 3 | |
| 0it [00:00, ?it/s] 0it [00:00, ?it/s] | |
| Proceeding 18-length images samples | Num: 1 | |
| 0it [00:00, ?it/s] 0it [00:00, ?it/s] | |
| Proceeding 60-length images samples | Num: 1 | |
| 0it [00:00, ?it/s] 0it [00:00, ?it/s] | |
| Proceeding 45-length images samples | Num: 2 | |
| 0it [00:00, ?it/s] 0it [00:00, ?it/s] | |
| Proceeding 13-length images samples | Num: 1 | |
| 0%| | 0/1 [00:00<?, ?it/s] 100%|ββββββββββ| 1/1 [00:01<00:00, 1.38s/it] 100%|ββββββββββ| 1/1 [00:01<00:00, 1.47s/it] | |
| Proceeding 38-length images samples | Num: 1 | |
| 0it [00:00, ?it/s] 0it [00:00, ?it/s] | |
| Proceeding 19-length images samples | Num: 1 | |
| 0it [00:00, ?it/s] 0it [00:00, ?it/s] | |
| Proceeding 25-length images samples | Num: 2 | |
| 0it [00:00, ?it/s] 0it [00:00, ?it/s] | |
| Proceeding 9-length images samples | Num: 1 | |
| 0it [00:00, ?it/s] 0it [00:00, ?it/s] | |
| Proceeding 20-length images samples | Num: 1 | |
| 0it [00:00, ?it/s] 0it [00:00, ?it/s] | |
| Proceeding 49-length images samples | Num: 1 | |
| 0it [00:00, ?it/s] 0it [00:00, ?it/s] | |
| Proceeding 39-length images samples | Num: 1 | |
| 0it [00:00, ?it/s] 0it [00:00, ?it/s] | |
| Proceeding 16-length images samples | Num: 4 | |
| 0it [00:00, ?it/s] 0it [00:00, ?it/s] | |
| Proceeding 54-length images samples | Num: 1 | |
| 0it [00:00, ?it/s] 0it [00:00, ?it/s] | |
| Proceeding 7-length images samples | Num: 1 | |
| 0it [00:00, ?it/s] 0it [00:00, ?it/s] | |
| 0%| | 0/1 [00:00<?, ?it/s] 100%|ββββββββββ| 1/1 [00:01<00:00, 1.50s/it] 100%|ββββββββββ| 1/1 [00:01<00:00, 1.59s/it] | |
| Proceeding 43-length images samples | Num: 1 | |
| Proceeding 40-length images samples | Num: 2 | |
| 0it [00:00, ?it/s] 0it [00:00, ?it/s] | |
| 0it [00:00, ?it/s] 0it [00:00, ?it/s] | |
| Proceeding 27-length images samples | Num: 7 | |
| 0%| | 0/12 [00:00<?, ?it/s] 8%|β | 1/12 [00:02<00:27, 2.54s/it] 17%|ββ | 2/12 [00:03<00:17, 1.76s/it] 25%|βββ | 3/12 [00:04<00:12, 1.35s/it] 33%|ββββ | 4/12 [00:05<00:09, 1.17s/it] 42%|βββββ | 5/12 [00:06<00:07, 1.06s/it] 50%|βββββ | 6/12 [00:07<00:05, 1.03it/s] 58%|ββββββ | 7/12 [00:08<00:05, 1.08s/it] 67%|βββββββ | 8/12 [00:09<00:03, 1.03it/s] 75%|ββββββββ | 9/12 [00:09<00:02, 1.14it/s] 83%|βββββββββ | 10/12 [00:11<00:02, 1.15s/it] 92%|ββββββββββ| 11/12 [00:12<00:01, 1.06s/it] 100%|ββββββββββ| 12/12 [00:14<00:00, 1.24s/it] 100%|ββββββββββ| 12/12 [00:14<00:00, 1.19s/it] | |
| Proceeding 21-length images samples | Num: 2 | |
| 0%| | 0/2 [00:00<?, ?it/s] 50%|βββββ | 1/2 [00:01<00:01, 1.94s/it] 100%|ββββββββββ| 2/2 [00:02<00:00, 1.22s/it] 100%|ββββββββββ| 2/2 [00:02<00:00, 1.36s/it] | |
| Proceeding 26-length images samples | Num: 1 | |
| 0%| | 0/1 [00:00<?, ?it/s] 100%|ββββββββββ| 1/1 [00:00<00:00, 1.04it/s] 100%|ββββββββββ| 1/1 [00:01<00:00, 1.06s/it] | |
| Proceeding 41-length images samples | Num: 1 | |
| 0it [00:00, ?it/s] 0it [00:00, ?it/s] | |
| Proceeding 81-length images samples | Num: 1 | |
| 0it [00:00, ?it/s] 0it [00:00, ?it/s] | |
| Proceeding 59-length images samples | Num: 1 | |
| 0it [00:00, ?it/s] 0it [00:00, ?it/s] | |
| Proceeding 12-length images samples | Num: 2 | |
| 0it [00:00, ?it/s] 0it [00:00, ?it/s] | |
| Proceeding 22-length images samples | Num: 2 | |
| 0it [00:00, ?it/s] 0it [00:00, ?it/s] | |
| Proceeding 46-length images samples | Num: 1 | |
| 0it [00:00, ?it/s] 0it [00:00, ?it/s] | |
| Proceeding 18-length images samples | Num: 1 | |
| 0it [00:00, ?it/s] 0it [00:00, ?it/s] | |
| Proceeding 60-length images samples | Num: 1 | |
| 0%| | 0/1 [00:00<?, ?it/s] 100%|ββββββββββ| 1/1 [00:00<00:00, 1.07it/s] 100%|ββββββββββ| 1/1 [00:01<00:00, 1.02s/it] | |
| Proceeding 15-length images samples | Num: 3 | |
| 0it [00:00, ?it/s] 0it [00:00, ?it/s] | |
| Proceeding 45-length images samples | Num: 2 | |
| Proceeding 21-length images samples | Num: 2 | |
| 0it [00:00, ?it/s] 0it [00:00, ?it/s] | |
| Proceeding 13-length images samples | Num: 1 | |
| 0it [00:00, ?it/s] 0it [00:00, ?it/s] | |
| Proceeding 19-length images samples | Num: 1 | |
| 0%| | 0/1 [00:00<?, ?it/s] 100%|ββββββββββ| 1/1 [00:00<00:00, 1.36it/s] 100%|ββββββββββ| 1/1 [00:00<00:00, 1.21it/s] | |
| Proceeding 38-length images samples | Num: 1 | |
| 0it [00:00, ?it/s] 0it [00:00, ?it/s] | |
| Proceeding 9-length images samples | Num: 1 | |
| 0it [00:00, ?it/s] 0it [00:00, ?it/s] | |
| Proceeding 25-length images samples | Num: 2 | |
| Proceeding 41-length images samples | Num: 1 | |
| 0it [00:00, ?it/s] 0it [00:00, ?it/s] | |
| Proceeding 49-length images samples | Num: 1 | |
| 0it [00:00, ?it/s] 0it [00:00, ?it/s] | |
| Proceeding 54-length images samples | Num: 1 | |
| 0it [00:00, ?it/s] 0it [00:00, ?it/s] | |
| Proceeding 7-length images samples | Num: 1 | |
| 0%| | 0/1 [00:00<?, ?it/s] 100%|ββββββββββ| 1/1 [00:01<00:00, 1.12s/it] 100%|ββββββββββ| 1/1 [00:01<00:00, 1.20s/it] | |
| Proceeding 20-length images samples | Num: 1 | |
| 0it [00:00, ?it/s] 0it [00:00, ?it/s] | |
| 0it [00:00, ?it/s] 0it [00:00, ?it/s] | |
| Proceeding 39-length images samples | Num: 1 | |
| Proceeding 59-length images samples | Num: 1 | |
| 0it [00:00, ?it/s] 0it [00:00, ?it/s] | |
| Proceeding 16-length images samples | Num: 4 | |
| 0%| | 0/1 [00:00<?, ?it/s] 100%|ββββββββββ| 1/1 [00:00<00:00, 1.31it/s] 100%|ββββββββββ| 1/1 [00:00<00:00, 1.20it/s] | |
| Proceeding 43-length images samples | Num: 1 | |
| 0it [00:00, ?it/s] 0it [00:00, ?it/s] | |
| Proceeding 40-length images samples | Num: 2 | |
| Proceeding 22-length images samples | Num: 2 | |
| 0%| | 0/1 [00:00<?, ?it/s] 100%|ββββββββββ| 1/1 [00:01<00:00, 1.39s/it] 100%|ββββββββββ| 1/1 [00:01<00:00, 1.47s/it] | |
| Proceeding 27-length images samples | Num: 7 | |
| Proceeding 15-length images samples | Num: 3 | |
| Proceeding 38-length images samples | Num: 1 | |
| 0%| | 0/2 [00:00<?, ?it/s] 50%|βββββ | 1/2 [00:01<00:01, 1.07s/it] 100%|ββββββββββ| 2/2 [00:01<00:00, 1.21it/s] 100%|ββββββββββ| 2/2 [00:01<00:00, 1.10it/s] | |
| Proceeding 26-length images samples | Num: 1 | |
| 0it [00:00, ?it/s] 0it [00:00, ?it/s] | |
| Proceeding 81-length images samples | Num: 1 | |
| 0it [00:00, ?it/s] 0it [00:00, ?it/s] | |
| Proceeding 12-length images samples | Num: 2 | |
| Proceeding 25-length images samples | Num: 2 | |
| 0%| | 0/1 [00:00<?, ?it/s] 100%|ββββββββββ| 1/1 [00:00<00:00, 1.72it/s] 100%|ββββββββββ| 1/1 [00:00<00:00, 1.52it/s] | |
| Proceeding 46-length images samples | Num: 1 | |
| 0it [00:00, ?it/s] 0it [00:00, ?it/s] | |
| Proceeding 18-length images samples | Num: 1 | |
| Proceeding 20-length images samples | Num: 1 | |
| 0it [00:00, ?it/s] 0it [00:00, ?it/s] | |
| Proceeding 60-length images samples | Num: 1 | |
| 0it [00:00, ?it/s] 0it [00:00, ?it/s] | |
| Proceeding 45-length images samples | Num: 2 | |
| Proceeding 39-length images samples | Num: 1 | |
| 0%| | 0/1 [00:00<?, ?it/s] 100%|ββββββββββ| 1/1 [00:01<00:00, 1.57s/it] 100%|ββββββββββ| 1/1 [00:01<00:00, 1.65s/it] | |
| Proceeding 13-length images samples | Num: 1 | |
| 0it [00:00, ?it/s] 0it [00:00, ?it/s] | |
| Proceeding 19-length images samples | Num: 1 | |
| Proceeding 16-length images samples | Num: 4 | |
| 0it [00:00, ?it/s] 0it [00:00, ?it/s] | |
| Proceeding 9-length images samples | Num: 1 | |
| 0it [00:00, ?it/s] 0it [00:00, ?it/s] | |
| Proceeding 49-length images samples | Num: 1 | |
| 0it [00:00, ?it/s] 0it [00:00, ?it/s] | |
| Proceeding 54-length images samples | Num: 1 | |
| Proceeding 43-length images samples | Num: 1 | |
| 0it [00:00, ?it/s] 0it [00:00, ?it/s] | |
| Proceeding 7-length images samples | Num: 1 | |
| 0it [00:00, ?it/s] 0it [00:00, ?it/s] | |
| Proceeding 40-length images samples | Num: 2 | |
| Proceeding 27-length images samples | Num: 7 | |
| Proceeding 26-length images samples | Num: 1 | |
| Proceeding 81-length images samples | Num: 1 | |
| Proceeding 12-length images samples | Num: 2 | |
| Proceeding 46-length images samples | Num: 1 | |
| Proceeding 18-length images samples | Num: 1 | |
| Proceeding 60-length images samples | Num: 1 | |
| Proceeding 45-length images samples | Num: 2 | |
| Proceeding 13-length images samples | Num: 1 | |
| Proceeding 19-length images samples | Num: 1 | |
| Proceeding 9-length images samples | Num: 1 | |
| Proceeding 49-length images samples | Num: 1 | |
| Proceeding 54-length images samples | Num: 1 | |
| Proceeding 7-length images samples | Num: 1 | |
| evaluating ActionPrediction ... | |
| Results saved to work_dirs/share_internvl/InternVL2-2B/eval_milebench/ActionPrediction/ActionPrediction_240803234615.json | |
| python eval/milebench/evaluate.py --data-dir /mnt/inspurfs/share_data/wangweiyun/share_data/long-context-benchmark/MileBench/datasets--FreedomIntelligence--MileBench/snapshots/53c7a58051ef88bacf76541d91f03f5ba2d71e7d --dataset ActionPrediction --result-dir work_dirs/share_internvl/InternVL2-2B/eval_milebench/ActionPrediction | |
| internvl: ActionPrediction: {'Accuracy': 0.755, 'image_quantity_level-Accuracy': {'Few': 0, 'Medium': 0.7555555555555555, 'Many': 0.7538461538461538}, 'image_quantity_level-Result': {'Few': [0, 0], 'Medium': [102, 135], 'Many': [49, 65]}} | |

