view article Article Introducing AuraFace: Open-Source Face Recognition and Identity Preservation Models By isidentical • Aug 26, 2024 • 39
EfficientViT-SAM: Accelerated Segment Anything Model Without Performance Loss Paper • 2402.05008 • Published Feb 7, 2024 • 20
ScreenAI: A Vision-Language Model for UI and Infographics Understanding Paper • 2402.04615 • Published Feb 7, 2024 • 40
Dolma: an Open Corpus of Three Trillion Tokens for Language Model Pretraining Research Paper • 2402.00159 • Published Jan 31, 2024 • 61
YOLO-World: Real-Time Open-Vocabulary Object Detection Paper • 2401.17270 • Published Jan 30, 2024 • 35
Depth Anything: Unleashing the Power of Large-Scale Unlabeled Data Paper • 2401.10891 • Published Jan 19, 2024 • 60
Open-Vocabulary SAM: Segment and Recognize Twenty-thousand Classes Interactively Paper • 2401.02955 • Published Jan 5, 2024 • 21
Apollo: An Exploration of Video Understanding in Large Multimodal Models Paper • 2412.10360 • Published 24 days ago • 136
PaliGemma 2: A Family of Versatile VLMs for Transfer Paper • 2412.03555 • Published Dec 4, 2024 • 121
PixMo Collection A set of vision-language datasets built by Ai2 and used to train the Molmo family of models. Read more at https://molmo.allenai.org/blog • 9 items • Updated about 8 hours ago • 53
Qwen-VL: A Frontier Large Vision-Language Model with Versatile Abilities Paper • 2308.12966 • Published Aug 24, 2023 • 7
LLaVA-Critic Collection as a general evaluator for assessing model performance • 6 items • Updated Oct 6, 2024 • 8
General OCR Theory: Towards OCR-2.0 via a Unified End-to-end Model Paper • 2409.01704 • Published Sep 3, 2024 • 83
Qwen2-Math Collection Math-specific model series based on Qwen2 • 8 items • Updated Nov 28, 2024 • 47