README.md · Jzzhang/NaVid at main

metadata

license: cc-by-nc-4.0

Pretrained Weights of NaVid: Video-based VLM Plans the Next Step for Vision-and-Language Navigation (RSS 2024)

The model is trained on samples collected from the training splits of VLN-CE R2R and RxR.

Evaliation Benchmark	TL	NE	OS	SR	SPL
VLN-CE R2R Val.	10.7	5.65	49.2	41.9	36.5
VLN-CE R2R Test	11.3	5.39	52	45	39
VLN-CE RxR Val.	15.4	5.72	55.6	45.7	38.2

The related inference code can be found in here