VisualPuzzles: Decoupling Multimodal Reasoning Evaluation from Domain Knowledge Paper β’ 2504.10342 β’ Published 7 days ago β’ 10
OLMoTrace: Tracing Language Model Outputs Back to Trillions of Training Tokens Paper β’ 2504.07096 β’ Published 12 days ago β’ 71
OmniSVG: A Unified Scalable Vector Graphics Generation Model Paper β’ 2504.06263 β’ Published 13 days ago β’ 146
Whisper-LM: Improving ASR Models with Language Models for Low-Resource Languages Paper β’ 2503.23542 β’ Published 22 days ago β’ 10
Large Language Model Agent: A Survey on Methodology, Applications and Challenges Paper β’ 2503.21460 β’ Published 25 days ago β’ 76
Deceptive Humor: A Synthetic Multilingual Benchmark Dataset for Bridging Fabricated Claims with Humorous Content Paper β’ 2503.16031 β’ Published Mar 20 β’ 3
Running on Zero 888 888 InfiniteYou-FLUX πΈ Flexible Photo Recrafting While Preserving Your Identity
Unlock Pose Diversity: Accurate and Efficient Implicit Keypoint-based Spatiotemporal Diffusion for Audio-driven Talking Portrait Paper β’ 2503.12963 β’ Published Mar 17 β’ 7
Creation-MMBench: Assessing Context-Aware Creative Intelligence in MLLM Paper β’ 2503.14478 β’ Published Mar 18 β’ 46
API Agents vs. GUI Agents: Divergence and Convergence Paper β’ 2503.11069 β’ Published Mar 14 β’ 35
LLM-Microscope: Uncovering the Hidden Role of Punctuation in Context Memory of Transformers Paper β’ 2502.15007 β’ Published Feb 20 β’ 174