ViLMA: A Zero-Shot Benchmark for Linguistic and Temporal Grounding in Video-Language Models Paper β’ 2311.07022 β’ Published Nov 13, 2023 β’ 1