13 Video-LLaMA: An Instruction-tuned Audio-Visual Language Model for Video Understanding · 3 authors 5
5 LLM-Blender: Ensembling Large Language Models with Pairwise Ranking and Generative Fusion · 3 authors 2
1 The Surprising Effectiveness of Diffusion Models for Optical Flow and Monocular Depth Estimation · 7 authors
1 VisualGPTScore: Visio-Linguistic Reasoning with Multimodal Generative Pre-Training Scores · 5 authors
1 Transformer-based Vulnerability Detection in Code at EditTime: Zero-shot, Few-shot, or Fine-tuning? · 8 authors 1