VHELM: A Holistic Evaluation of Vision Language Models Paper • 2410.07112 • Published Oct 9 • 2
VHELM: A Holistic Evaluation of Vision Language Models Paper • 2410.07112 • Published Oct 9 • 2 • 2
A Preliminary Study of o1 in Medicine: Are We Closer to an AI Doctor? Paper • 2409.15277 • Published Sep 23 • 34
Tuning LayerNorm in Attention: Towards Efficient Multi-Modal LLM Finetuning Paper • 2312.11420 • Published Dec 18, 2023 • 2
Eagle and Finch: RWKV with Matrix-Valued States and Dynamic Recurrence Paper • 2404.05892 • Published Apr 8 • 31
MJ-Bench: Is Your Multimodal Reward Model Really a Good Judge for Text-to-Image Generation? Paper • 2407.04842 • Published Jul 5 • 52
MJ-Bench: Is Your Multimodal Reward Model Really a Good Judge for Text-to-Image Generation? Paper • 2407.04842 • Published Jul 5 • 52
UCSC-VLAA/ViT-L-16-HTxt-Recap-CLIP Zero-Shot Image Classification • Updated Jun 24 • 42.8k • 17
What If We Recaption Billions of Web Images with LLaMA-3? Paper • 2406.08478 • Published Jun 12 • 39
What If We Recaption Billions of Web Images with LLaMA-3? Paper • 2406.08478 • Published Jun 12 • 39