Generalization Beyond Data Imbalance: A Controlled Study on CLIP for Transferable Insights Paper • 2405.21070 • Published May 31
CLIPS: An Enhanced CLIP Framework for Learning with Synthetic Captions Paper • 2411.16828 • Published 27 days ago • 1
Safety Fine-Tuning at (Almost) No Cost: A Baseline for Vision Large Language Models Paper • 2402.02207 • Published Feb 3 • 2
VL-ICL Bench: The Devil in the Details of Benchmarking Multimodal In-Context Learning Paper • 2403.13164 • Published Mar 19 • 1
Memorized Images in Diffusion Models share a Subspace that can be Located and Deleted Paper • 2406.18566 • Published Jun 1 • 1
A Preliminary Study of o1 in Medicine: Are We Closer to an AI Doctor? Paper • 2409.15277 • Published Sep 23 • 34
A Preliminary Study of o1 in Medicine: Are We Closer to an AI Doctor? Paper • 2409.15277 • Published Sep 23 • 34
Benchmarking Multi-Image Understanding in Vision and Language Models: Perception, Knowledge, Reasoning, and Multi-Hop Reasoning Paper • 2406.12742 • Published Jun 18 • 14
Benchmarking Multi-Image Understanding in Vision and Language Models: Perception, Knowledge, Reasoning, and Multi-Hop Reasoning Paper • 2406.12742 • Published Jun 18 • 14
Parametric Classification for Generalized Category Discovery: A Baseline Study Paper • 2211.11727 • Published Nov 21, 2022 • 1
Learning Semi-supervised Gaussian Mixture Models for Generalized Category Discovery Paper • 2305.06144 • Published May 10, 2023 • 1
Improving Contrastive Learning by Visualizing Feature Transformation Paper • 2108.02982 • Published Aug 6, 2021
Self-Supervised Visual Representation Learning with Semantic Grouping Paper • 2205.15288 • Published May 30, 2022
How Many Unicorns Are in This Image? A Safety Evaluation Benchmark for Vision LLMs Paper • 2311.16101 • Published Nov 27, 2023 • 1
Tuning LayerNorm in Attention: Towards Efficient Multi-Modal LLM Finetuning Paper • 2312.11420 • Published Dec 18, 2023 • 2
Compress & Align: Curating Image-Text Data with Human Knowledge Paper • 2312.06726 • Published Dec 11, 2023
Eagle and Finch: RWKV with Matrix-Valued States and Dynamic Recurrence Paper • 2404.05892 • Published Apr 8 • 32
HQ-Edit: A High-Quality Dataset for Instruction-based Image Editing Paper • 2404.09990 • Published Apr 15 • 12
Benchmarking Multi-Image Understanding in Vision and Language Models: Perception, Knowledge, Reasoning, and Multi-Hop Reasoning Paper • 2406.12742 • Published Jun 18 • 14