Seaweed-7B: Cost-Effective Training of Video Generation Foundation Model Paper • 2504.08685 • Published 13 days ago • 121
An Empirical Study of GPT-4o Image Generation Capabilities Paper • 2504.05979 • Published 16 days ago • 61
SmolVLM: Redefining small and efficient multimodal models Paper • 2504.05299 • Published 16 days ago • 170
Advances and Challenges in Foundation Agents: From Brain-Inspired Intelligence to Evolutionary, Collaborative, and Safe Systems Paper • 2504.01990 • Published 23 days ago • 256
A Survey of Efficient Reasoning for Large Reasoning Models: Language, Multimodality, and Beyond Paper • 2503.21614 • Published 28 days ago • 39
Large Language Model Agent: A Survey on Methodology, Applications and Challenges Paper • 2503.21460 • Published 28 days ago • 76
Mind with Eyes: from Language Reasoning to Multimodal Reasoning Paper • 2503.18071 • Published Mar 23 • 3
SigLIP 2: Multilingual Vision-Language Encoders with Improved Semantic Understanding, Localization, and Dense Features Paper • 2502.14786 • Published Feb 20 • 143
Large-scale Pre-training for Grounded Video Caption Generation Paper • 2503.10781 • Published Mar 13 • 17
Block Diffusion: Interpolating Between Autoregressive and Diffusion Language Models Paper • 2503.09573 • Published Mar 12 • 71
Native Sparse Attention: Hardware-Aligned and Natively Trainable Sparse Attention Paper • 2502.11089 • Published Feb 16 • 155