EVLM: An Efficient Vision-Language Model for Visual Understanding Paper • 2407.14177 • Published Jul 19 • 42 • 5
RULE: Reliable Multimodal RAG for Factuality in Medical Vision Language Models Paper • 2407.05131 • Published Jul 6 • 22 • 3
VisFocus: Prompt-Guided Vision Encoders for OCR-Free Dense Document Understanding Paper • 2407.12594 • Published Jul 17 • 18 • 4