A Picture is Worth More Than 77 Text Tokens: Evaluating CLIP-Style Models on Dense Captions Paper β’ 2312.08578 β’ Published Dec 14, 2023 β’ 15
VILA: On Pre-training for Visual Language Models Paper β’ 2312.07533 β’ Published Dec 12, 2023 β’ 18
Latent Consistency Models: Synthesizing High-Resolution Images with Few-Step Inference Paper β’ 2310.04378 β’ Published Oct 6, 2023 β’ 19
PockEngine: Sparse and Efficient Fine-tuning in a Pocket Paper β’ 2310.17752 β’ Published Oct 26, 2023 β’ 11