Learnings from Scaling Visual Tokenizers for Reconstruction and Generation Paper ⢠2501.09755 ⢠Published Jan 16 ⢠36
Video-STaR: Self-Training Enables Video Instruction Tuning with Any Supervision Paper ⢠2407.06189 ⢠Published Jul 8, 2024 ⢠26
Open World Object Detection in the Era of Foundation Models Paper ⢠2312.05745 ⢠Published Dec 10, 2023 ⢠1
PROB: Probabilistic Objectness for Open World Object Detection Paper ⢠2212.01424 ⢠Published Dec 2, 2022
VideoAgent: Long-form Video Understanding with Large Language Model as Agent Paper ⢠2403.10517 ⢠Published Mar 15, 2024 ⢠35