Describe Anything Collection Multimodal Large Language Models for Detailed Localized Image and Video Captioning • 6 items • Updated about 17 hours ago • 21
DeepMath-103K: A Large-Scale, Challenging, Decontaminated, and Verifiable Mathematical Dataset for Advancing Reasoning Paper • 2504.11456 • Published 8 days ago • 11
NormalCrafter: Learning Temporally Consistent Normals from Video Diffusion Priors Paper • 2504.11427 • Published 8 days ago • 17
MineWorld: a Real-Time and Open-Source Interactive World Model on Minecraft Paper • 2504.08388 • Published 13 days ago • 39
AgentRewardBench: Evaluating Automatic Evaluations of Web Agent Trajectories Paper • 2504.08942 • Published 12 days ago • 27
OpenCodeReasoning: Advancing Data Distillation for Competitive Coding Paper • 2504.01943 • Published 21 days ago • 13
view article Article Hugging Face to sell open-source robots thanks to Pollen Robotics acquisition 🤖 10 days ago • 38
VisualCloze: A Universal Image Generation Framework via Visual In-Context Learning Paper • 2504.07960 • Published 13 days ago • 46
view article Article Hugging Face and Cloudflare Partner to Make Real-Time Speech and Video Seamless with FastRTC 15 days ago • 21
PERSE: Personalized 3D Generative Avatars from A Single Portrait Paper • 2412.21206 • Published Dec 30, 2024 • 19
How "Real" is Your Real-Time Simultaneous Speech-to-Text Translation System? Paper • 2412.18495 • Published Dec 24, 2024 • 9
EgoVid-5M: A Large-Scale Video-Action Dataset for Egocentric Video Generation Paper • 2411.08380 • Published Nov 13, 2024 • 27