What matters when building vision-language models? Paper • 2405.02246 • Published about 1 month ago • 87