-
EVA-CLIP-18B: Scaling CLIP to 18 Billion Parameters
Paper • 2402.04252 • Published • 21 -
Vision Superalignment: Weak-to-Strong Generalization for Vision Foundation Models
Paper • 2402.03749 • Published • 9 -
ScreenAI: A Vision-Language Model for UI and Infographics Understanding
Paper • 2402.04615 • Published • 31 -
EfficientViT-SAM: Accelerated Segment Anything Model Without Performance Loss
Paper • 2402.05008 • Published • 19
Collections
Discover the best community collections!
Collections including paper arxiv:2402.04252
-
EVA-CLIP-18B: Scaling CLIP to 18 Billion Parameters
Paper • 2402.04252 • Published • 21 -
MiniGPT4-Video: Advancing Multimodal LLMs for Video Understanding with Interleaved Visual-Textual Tokens
Paper • 2404.03413 • Published • 21 -
openai/clip-vit-large-patch14-336
Zero-Shot Image Classification • Updated • 6.66M • 130 -
openai/clip-vit-base-patch32
Zero-Shot Image Classification • Updated • 16.4M • 382
-
Functional Interpolation for Relative Positions Improves Long Context Transformers
Paper • 2310.04418 • Published • 4 -
SPBERT: An Efficient Pre-training BERT on SPARQL Queries for Question Answering over Knowledge Graphs
Paper • 2106.09997 • Published • 2 -
Neural Machine Translation of Rare Words with Subword Units
Paper • 1508.07909 • Published • 4 -
A Multimodal Approach to Device-Directed Speech Detection with Large Language Models
Paper • 2403.14438 • Published • 2
-
YAYI 2: Multilingual Open-Source Large Language Models
Paper • 2312.14862 • Published • 11 -
EVA-CLIP-18B: Scaling CLIP to 18 Billion Parameters
Paper • 2402.04252 • Published • 21 -
MedAlign: A Clinician-Generated Dataset for Instruction Following with Electronic Medical Records
Paper • 2308.14089 • Published • 24 -
LLaSM: Large Language and Speech Model
Paper • 2308.15930 • Published • 29