arxiv:2411.14402
Zhe Gan
zhegan27
AI & ML interests
multimodal learning, vision and language
Recent Activity
upvoted
a
paper
15 days ago
STIV: Scalable Text and Image Conditioned Video Generation
authored
a paper
about 1 month ago
Multimodal Autoregressive Pre-training of Large Vision Encoders
Organizations
None yet
models
None public yet
datasets
None public yet