-
The Era of 1-bit LLMs: All Large Language Models are in 1.58 Bits
Paper • 2402.17764 • Published • 568 -
Sora: A Review on Background, Technology, Limitations, and Opportunities of Large Vision Models
Paper • 2402.17177 • Published • 87 -
Griffin: Mixing Gated Linear Recurrences with Local Attention for Efficient Language Models
Paper • 2402.19427 • Published • 50 -
Panda-70M: Captioning 70M Videos with Multiple Cross-Modality Teachers
Paper • 2402.19479 • Published • 30
Collections
Discover the best community collections!
Collections including paper arxiv:2402.17177
-
MegaScale: Scaling Large Language Model Training to More Than 10,000 GPUs
Paper • 2402.15627 • Published • 32 -
Sora: A Review on Background, Technology, Limitations, and Opportunities of Large Vision Models
Paper • 2402.17177 • Published • 87 -
Beyond Language Models: Byte Models are Digital World Simulators
Paper • 2402.19155 • Published • 46 -
Hydragen: High-Throughput LLM Inference with Shared Prefixes
Paper • 2402.05099 • Published • 17
-
Language Agent Tree Search Unifies Reasoning Acting and Planning in Language Models
Paper • 2310.04406 • Published • 8 -
Chain-of-Thought Reasoning Without Prompting
Paper • 2402.10200 • Published • 91 -
ICDPO: Effectively Borrowing Alignment Capability of Others via In-context Direct Preference Optimization
Paper • 2402.09320 • Published • 6 -
Self-Discover: Large Language Models Self-Compose Reasoning Structures
Paper • 2402.03620 • Published • 103
-
EVA-CLIP-18B: Scaling CLIP to 18 Billion Parameters
Paper • 2402.04252 • Published • 21 -
Vision Superalignment: Weak-to-Strong Generalization for Vision Foundation Models
Paper • 2402.03749 • Published • 9 -
ScreenAI: A Vision-Language Model for UI and Infographics Understanding
Paper • 2402.04615 • Published • 31 -
EfficientViT-SAM: Accelerated Segment Anything Model Without Performance Loss
Paper • 2402.05008 • Published • 19
-
Direct-a-Video: Customized Video Generation with User-Directed Camera Movement and Object Motion
Paper • 2402.03162 • Published • 17 -
InteractiveVideo: User-Centric Controllable Video Generation with Synergistic Multimodal Instructions
Paper • 2402.03040 • Published • 16 -
Magic-Me: Identity-Specific Video Customized Diffusion
Paper • 2402.09368 • Published • 24 -
LAVE: LLM-Powered Agent Assistance and Language Augmentation for Video Editing
Paper • 2402.10294 • Published • 20
-
Neural Network Diffusion
Paper • 2402.13144 • Published • 93 -
Genie: Generative Interactive Environments
Paper • 2402.15391 • Published • 67 -
Sora: A Review on Background, Technology, Limitations, and Opportunities of Large Vision Models
Paper • 2402.17177 • Published • 87 -
VisionLLaMA: A Unified LLaMA Interface for Vision Tasks
Paper • 2403.00522 • Published • 40
-
Motion-I2V: Consistent and Controllable Image-to-Video Generation with Explicit Motion Modeling
Paper • 2401.15977 • Published • 34 -
Lumiere: A Space-Time Diffusion Model for Video Generation
Paper • 2401.12945 • Published • 83 -
AnimateDiff: Animate Your Personalized Text-to-Image Diffusion Models without Specific Tuning
Paper • 2307.04725 • Published • 63 -
Boximator: Generating Rich and Controllable Motions for Video Synthesis
Paper • 2402.01566 • Published • 26