Large-scale Pre-training for Grounded Video Caption Generation Paper • 2503.10781 • Published 5 days ago • 14
Large-scale Pre-training for Grounded Video Caption Generation Paper • 2503.10781 • Published 5 days ago • 14 • 2
TIM: A Time Interval Machine for Audio-Visual Action Recognition Paper • 2404.05559 • Published Apr 8, 2024
Large-scale Pre-training for Grounded Video Caption Generation Paper • 2503.10781 • Published 5 days ago • 14