Balancing Pipeline Parallelism with Vocabulary Parallelism Paper • 2411.05288 • Published 13 days ago • 19
Mixture-of-Transformers: A Sparse and Scalable Architecture for Multi-Modal Foundation Models Paper • 2411.04996 • Published 14 days ago • 48
DimensionX: Create Any 3D and 4D Scenes from a Single Image with Controllable Video Diffusion Paper • 2411.04928 • Published 14 days ago • 47
ReCapture: Generative Video Camera Controls for User-Provided Videos using Masked Video Fine-Tuning Paper • 2411.05003 • Published 14 days ago • 67
OpenCoder: The Open Cookbook for Top-Tier Code Large Language Models Paper • 2411.04905 • Published 14 days ago • 108
OS-ATLAS: A Foundation Action Model for Generalist GUI Agents Paper • 2410.23218 • Published 22 days ago • 46
SALSA: Soup-based Alignment Learning for Stronger Adaptation in RLHF Paper • 2411.01798 • Published 17 days ago • 8
Decoding Dark Matter: Specialized Sparse Autoencoders for Interpreting Rare Concepts in Foundation Models Paper • 2411.00743 • Published 20 days ago • 6
AutoVFX: Physically Realistic Video Editing from Natural Language Instructions Paper • 2411.02394 • Published 17 days ago • 17
LIBMoE: A Library for comprehensive benchmarking Mixture of Experts in Large Language Models Paper • 2411.00918 • Published 20 days ago • 8
PPLLaVA: Varied Video Sequence Understanding With Prompt Guidance Paper • 2411.02327 • Published 17 days ago • 11
IGOR: Image-GOal Representations are the Atomic Control Units for Foundation Models in Embodied AI Paper • 2411.00785 • Published Oct 17 • 8
Sparsing Law: Towards Large Language Models with Greater Activation Sparsity Paper • 2411.02335 • Published 17 days ago • 11
Adaptive Caching for Faster Video Generation with Diffusion Transformers Paper • 2411.02397 • Published 17 days ago • 20
DynaSaur: Large Language Agents Beyond Predefined Actions Paper • 2411.01747 • Published 17 days ago • 13
DynaMath: A Dynamic Visual Benchmark for Evaluating Mathematical Reasoning Robustness of Vision Language Models Paper • 2411.00836 • Published 23 days ago • 15
Hunyuan-Large: An Open-Source MoE Model with 52 Billion Activated Parameters by Tencent Paper • 2411.02265 • Published 17 days ago • 24
How Far is Video Generation from World Model: A Physical Law Perspective Paper • 2411.02385 • Published 17 days ago • 32