COSMO: COntrastive Streamlined MultimOdal Model with Interleaved Pre-Training Paper • 2401.00849 • Published Jan 1 • 14
UniVTG: Towards Unified Video-Language Temporal Grounding Paper • 2307.16715 • Published Jul 31, 2023 • 8
EgoVLPv2: Egocentric Video-Language Pre-training with Fusion in the Backbone Paper • 2307.05463 • Published Jul 11, 2023 • 9