InternLM-XComposer2.5-OmniLive: A Comprehensive Multimodal System for Long-term Streaming Video and Audio Interactions Paper • 2412.09596 • Published 13 days ago • 90
Is Your LLM Secretly a World Model of the Internet? Model-Based Planning for Web Agents Paper • 2411.06559 • Published Nov 10 • 11
Sharingan: Extract User Action Sequence from Desktop Recordings Paper • 2411.08768 • Published Nov 13 • 10
Sharingan: Extract User Action Sequence from Desktop Recordings Paper • 2411.08768 • Published Nov 13 • 10 • 2
Thanos: Enhancing Conversational Agents with Skill-of-Mind-Infused Large Language Model Paper • 2411.04496 • Published Nov 7 • 22
Survey of User Interface Design and Interaction Techniques in Generative AI Applications Paper • 2410.22370 • Published Oct 28 • 11
Navigating the Unknown: A Chat-Based Collaborative Interface for Personalized Exploratory Tasks Paper • 2410.24032 • Published Oct 31 • 9 • 2
Unbounded: A Generative Infinite Game of Character Life Simulation Paper • 2410.18975 • Published Oct 24 • 35
Tracking Universal Features Through Fine-Tuning and Model Merging Paper • 2410.12391 • Published Oct 16 • 5
HumanEval-V: Evaluating Visual Understanding and Reasoning Abilities of Large Multimodal Models Through Coding Tasks Paper • 2410.12381 • Published Oct 16 • 42
Agent S: An Open Agentic Framework that Uses Computers Like a Human Paper • 2410.08164 • Published Oct 10 • 24
Inference Scaling for Long-Context Retrieval Augmented Generation Paper • 2410.04343 • Published Oct 6 • 9
PhysGen: Rigid-Body Physics-Grounded Image-to-Video Generation Paper • 2409.18964 • Published Sep 27 • 25
Paper Copilot: A Self-Evolving and Efficient LLM System for Personalized Academic Assistance Paper • 2409.04593 • Published Sep 6 • 23
WildVis: Open Source Visualizer for Million-Scale Chat Logs in the Wild Paper • 2409.03753 • Published Sep 5 • 18