Can Vision-Language Models Answer Face to Face Questions in the Real-World? Paper โข 2503.19356 โข Published 9 days ago โข 2 โข 2