InternLM-XComposer2-4KHD: A Pioneering Large Vision-Language Model Handling Resolutions from 336 Pixels to 4K HD Paper β’ 2404.06512 β’ Published Apr 9 β’ 29
The Era of 1-bit LLMs: All Large Language Models are in 1.58 Bits Paper β’ 2402.17764 β’ Published Feb 27 β’ 564
Small Language Model Meets with Reinforced Vision Vocabulary Paper β’ 2401.12503 β’ Published Jan 23 β’ 30
SpatialVLM: Endowing Vision-Language Models with Spatial Reasoning Capabilities Paper β’ 2401.12168 β’ Published Jan 22 β’ 21
CMMMU: A Chinese Massive Multi-discipline Multimodal Understanding Benchmark Paper β’ 2401.11944 β’ Published Jan 22 β’ 24
Scalable Pre-training of Large Autoregressive Image Models Paper β’ 2401.08541 β’ Published Jan 16 β’ 35
SigLIP Collection Contrastive (sigmoid) image-text models from https://arxiv.org/abs/2303.15343 β’ 8 items β’ Updated Apr 9 β’ 23
Distributed Inference and Fine-tuning of Large Language Models Over The Internet Paper β’ 2312.08361 β’ Published Dec 13, 2023 β’ 23
Retentive Network: A Successor to Transformer for Large Language Models Paper β’ 2307.08621 β’ Published Jul 17, 2023 β’ 166