Small Visual Language Models can also be Open-Ended Few-Shot Learners Paper • 2310.00500 • Published Sep 30, 2023
A critical analysis of self-supervision, or what we can learn from a single image Paper • 1904.13132 • Published Apr 30, 2019
Is ImageNet worth 1 video? Learning strong image encoders from 1 long unlabelled video Paper • 2310.08584 • Published Oct 12, 2023 • 2
Bias Out-of-the-Box: An Empirical Analysis of Intersectional Occupational Biases in Popular Generative Language Models Paper • 2102.04130 • Published Feb 8, 2021
VaLID: Variable-Length Input Diffusion for Novel View Synthesis Paper • 2312.08892 • Published Dec 14, 2023
PASS: An ImageNet replacement for self-supervised pretraining without humans Paper • 2109.13228 • Published Sep 27, 2021
PIN: Positional Insert Unlocks Object Localisation Abilities in VLMs Paper • 2402.08657 • Published Feb 13 • 1
Time Does Tell: Self-Supervised Time-Tuning of Dense Image Representations Paper • 2308.11796 • Published Aug 22, 2023 • 1
Think Big, Generate Quick: LLM-to-SLM for Fast Autoregressive Decoding Paper • 2402.16844 • Published Feb 26
NeCo: Improving DINOv2's spatial representations in 19 GPU hours with Patch Neighbor Consistency Paper • 2408.11054 • Published Aug 20 • 11
NeCo: Improving DINOv2's spatial representations in 19 GPU hours with Patch Neighbor Consistency Paper • 2408.11054 • Published Aug 20 • 11