Interactive Medical Image Segmentation: A Benchmark Dataset and Baseline Paper • 2411.12814 • Published 13 days ago • 20
OmniEdit: Building Image Editing Generalist Models Through Specialist Supervision Paper • 2411.07199 • Published 21 days ago • 44
LLM2CLIP: Powerful Language Model Unlock Richer Visual Representation Paper • 2411.04997 • Published 25 days ago • 35
General OCR Theory: Towards OCR-2.0 via a Unified End-to-end Model Paper • 2409.01704 • Published Sep 3 • 82
OpenCulture Collection A multilingual dataset of public domain books and newspapers. • 27 items • Updated 26 days ago • 117
Transformers compatible Mamba Collection This release includes the `mamba` repositories compatible with the `transformers` library • 5 items • Updated Mar 6 • 36
Mobile-Agent: Autonomous Multi-Modal Mobile Device Agent with Visual Perception Paper • 2401.16158 • Published Jan 29 • 18