MDocAgent: A Multi-Modal Multi-Agent Framework for Document Understanding Paper • 2503.13964 • Published 9 days ago • 16
LLaVA-Grounding: Grounded Visual Chat with Large Multimodal Models Paper • 2312.02949 • Published Dec 5, 2023 • 15
Aligning Multimodal LLM with Human Preference: A Survey Paper • 2503.14504 • Published 9 days ago • 21
NitiBench: A Comprehensive Studies of LLM Frameworks Capabilities for Thai Legal Question Answering Paper • 2502.10868 • Published Feb 15 • 2
ViDoRAG: Visual Document Retrieval-Augmented Generation via Dynamic Iterative Reasoning Agents Paper • 2502.18017 • Published about 1 month ago • 19
Scalable Vision Language Model Training via High Quality Data Curation Paper • 2501.05952 • Published Jan 10 • 1
ColQwen2 Models Collection Pre-trained checkpoints for the ColQwen2 model. • 4 items • Updated Jan 23 • 4
Qwen2.5-VL Collection Vision-language model series based on Qwen2.5 • 10 items • Updated 3 days ago • 417
ZebraLogic: On the Scaling Limits of LLMs for Logical Reasoning Paper • 2502.01100 • Published Feb 3 • 17
Question Answering on Patient Medical Records with Private Fine-Tuned LLMs Paper • 2501.13687 • Published Jan 23 • 9
HuatuoGPT-o1, Towards Medical Complex Reasoning with LLMs Paper • 2412.18925 • Published Dec 25, 2024 • 100