-
LLaVA-Plus: Learning to Use Tools for Creating Multimodal Agents
Paper • 2311.05437 • Published • 50 -
On the Road with GPT-4V(ision): Early Explorations of Visual-Language Model on Autonomous Driving
Paper • 2311.05332 • Published • 13 -
SoundCam: A Dataset for Finding Humans Using Room Acoustics
Paper • 2311.03517 • Published • 14
Chaolei Tan
Chaolei
·
AI & ML interests
Computer Vision, Multimodal Learning, Video Understanding
Recent Activity
liked
a model
15 days ago
microsoft/Phi-4-multimodal-instruct
liked
a Space
over 1 year ago
hysts/daily-papers
updated
a collection
over 1 year ago
AIGC
Organizations
None yet
Collections
10
models
None public yet
datasets
None public yet