Key models for robotic computer vision, object grounding, 3D reconstruction, and sim-to-real transfer
-
nvidia/LocateAnything-3B
Image-Text-to-Text • 4B • Updated • 801k • 2.49k -
facebook/dinov2-large
Image Feature Extraction • 0.3B • Updated • 976k • 113 -
microsoft/Florence-2-large
Image-Text-to-Text • 0.8B • Updated • 661k • 1.83k -
facebook/sam2-hiera-large
Mask Generation • 0.2B • Updated • 12.6k • 140