Jzzhang
/

NaVid

Model card Files Files and versions Community

File size: 774 Bytes

3ac22cc
 
 
 
 
 
fe58433
 
 
 
 
1f96dbd

---
license: cc-by-nc-4.0
---
Pretrained Weights of [NaVid](https://pku-epic.github.io/NaVid/): Video-based VLM Plans the Next Step for Vision-and-Language Navigation (RSS 2024)

The model is trained on samples collected from the training splits of [VLN-CE](https://github.com/jacobkrantz/VLN-CE) R2R and RxR.

| Evaliation Benchmark |  TL  |  NE  |  OS  |  SR  |  SPL |
|----------------------|:----:|:----:|:----:|:----:|:----:|
| VLN-CE R2R Val.      | 10.7 | 5.65 | 49.2 | 41.9 | 36.5 |
| [VLN-CE R2R Test](https://eval.ai/web/challenges/challenge-page/719/leaderboard/1966)      | 11.3 | 5.39 |  52  |  45  |  39  |
| VLN-CE RxR Val.      | 15.4 | 5.72 | 55.6 | 45.7 | 38.2 |

The related inference code can be found in [here](https://github.com/jzhzhang/NaVid-VLN-CE)