12 ViGoR: Improving Visual Grounding of Large Vision Language Models with Fine-Grained Reward Modeling · 6 authors 1
8 Premier-TACO: Pretraining Multitask Representation via Temporal Action-Driven Contrastive Loss · 10 authors 2