What matters when building vision-language models? Paper • 2405.02246 • Published 23 days ago • 87