Learning from Massive Human Videos for Universal Humanoid Pose Control Paper • 2412.14172 • Published 3 days ago • 10
AgentTrek: Agent Trajectory Synthesis via Guiding Replay with Web Tutorials Paper • 2412.09605 • Published 9 days ago • 25
Negative Token Merging: Image-based Adversarial Feature Guidance Paper • 2412.01339 • Published 20 days ago • 21
Aguvis: Unified Pure Vision Agents for Autonomous GUI Interaction Paper • 2412.04454 • Published 16 days ago • 46
Aguvis: Unified Pure Vision Agents for Autonomous GUI Interaction Paper • 2412.04454 • Published 16 days ago • 46
When Precision Meets Position: BFloat16 Breaks Down RoPE in Long-Context Training Paper • 2411.13476 • Published Nov 20 • 14
Spider2-V: How Far Are Multimodal Agents From Automating Data Science and Engineering Workflows? Paper • 2407.10956 • Published Jul 15 • 6
OpenCoder: The Open Cookbook for Top-Tier Code Large Language Models Paper • 2411.04905 • Published Nov 7 • 111
OSWorld: Benchmarking Multimodal Agents for Open-Ended Tasks in Real Computer Environments Paper • 2404.07972 • Published Apr 11 • 46
Cheating Automatic LLM Benchmarks: Null Models Achieve High Win Rates Paper • 2410.07137 • Published Oct 9 • 7
Programming Every Example: Lifting Pre-training Data Quality like Experts at Scale Paper • 2409.17115 • Published Sep 25 • 60
Programming Every Example: Lifting Pre-training Data Quality like Experts at Scale Paper • 2409.17115 • Published Sep 25 • 60
OpenDevin: An Open Platform for AI Software Developers as Generalist Agents Paper • 2407.16741 • Published Jul 23 • 68
One Embedder, Any Task: Instruction-Finetuned Text Embeddings Paper • 2212.09741 • Published Dec 19, 2022 • 3