ChatGen: Automatic Text-to-Image Generation From FreeStyle Chatting Paper β’ 2411.17176 β’ Published 25 days ago β’ 22
Enhancing the Reasoning Ability of Multimodal Large Language Models via Mixed Preference Optimization Paper β’ 2411.10442 β’ Published Nov 15 β’ 63
The Dawn of GUI Agent: A Preliminary Case Study with Claude 3.5 Computer Use Paper β’ 2411.10323 β’ Published Nov 15 β’ 31
Vision-Language Models Can Self-Improve Reasoning via Reflection Paper β’ 2411.00855 β’ Published Oct 30 β’ 4
OS-ATLAS: A Foundation Action Model for Generalist GUI Agents Paper β’ 2410.23218 β’ Published Oct 30 β’ 46
AgentStore: Scalable Integration of Heterogeneous Agents As Specialized Generalist Computer Assistant Paper β’ 2410.18603 β’ Published Oct 24 β’ 31
MJ-Bench: Is Your Multimodal Reward Model Really a Good Judge for Text-to-Image Generation? Paper β’ 2407.04842 β’ Published Jul 5 β’ 52
Interactive Evolution: A Neural-Symbolic Self-Training Framework For Large Language Models Paper β’ 2406.11736 β’ Published Jun 17 β’ 5
LLaMAX: Scaling Linguistic Horizons of LLM by Enhancing Translation Capabilities Beyond 100 Languages Paper β’ 2407.05975 β’ Published Jul 8 β’ 34
Symbol-LLM: Towards Foundational Symbol-centric Interface For Large Language Models Paper β’ 2311.09278 β’ Published Nov 15, 2023 β’ 7