CodeI/O: Condensing Reasoning Patterns via Code Input-Output Prediction Paper • 2502.07316 • Published 20 days ago • 45
LayoutLM Collection The LayoutLM series are Transformer encoders useful for document AI tasks such as invoice parsing, document image classification and DocVQA. • 6 items • Updated 26 days ago • 18
Diving into Self-Evolving Training for Multimodal Reasoning Paper • 2412.17451 • Published Dec 23, 2024 • 43
AGUVIS: Unified Pure Vision GUI Agents Collection https://aguvis-project.github.io • 3 items • Updated Dec 20, 2024 • 5
AGUVIS: Unified Pure Vision GUI Agents Collection https://aguvis-project.github.io • 3 items • Updated Dec 20, 2024 • 5