Diffusion for World Modeling: Visual Details Matter in Atari Paper • 2405.12399 • Published 2 days ago • 16
A Careful Examination of Large Language Model Performance on Grade School Arithmetic Paper • 2405.00332 • Published 22 days ago • 24
ControlNet++: Improving Conditional Controls with Efficient Consistency Feedback Paper • 2404.07987 • Published Apr 11 • 46
On the Scalability of Diffusion-based Text-to-Image Generation Paper • 2404.02883 • Published Apr 3 • 17
Mixture-of-Depths: Dynamically allocating compute in transformer-based language models Paper • 2404.02258 • Published Apr 2 • 101
Visual Autoregressive Modeling: Scalable Image Generation via Next-Scale Prediction Paper • 2404.02905 • Published Apr 3 • 59
Getting it Right: Improving Spatial Consistency in Text-to-Image Models Paper • 2404.01197 • Published Apr 1 • 29
InstantSplat: Unbounded Sparse-view Pose-free Gaussian Splatting in 40 Seconds Paper • 2403.20309 • Published Mar 29 • 16
FlashFace: Human Image Personalization with High-fidelity Identity Preservation Paper • 2403.17008 • Published Mar 25 • 18
Efficient Video Diffusion Models via Content-Frame Motion-Latent Decomposition Paper • 2403.14148 • Published Mar 21 • 17
FouriScale: A Frequency Perspective on Training-Free High-Resolution Image Synthesis Paper • 2403.12963 • Published Mar 19 • 6
LightIt: Illumination Modeling and Control for Diffusion Models Paper • 2403.10615 • Published Mar 15 • 14
SV3D: Novel Multi-view Synthesis and 3D Generation from a Single Image using Latent Video Diffusion Paper • 2403.12008 • Published Mar 18 • 18
Chatbot Arena: An Open Platform for Evaluating LLMs by Human Preference Paper • 2403.04132 • Published Mar 7 • 38
NaturalSpeech 3: Zero-Shot Speech Synthesis with Factorized Codec and Diffusion Models Paper • 2403.03100 • Published Mar 5 • 32
Scaling Rectified Flow Transformers for High-Resolution Image Synthesis Paper • 2403.03206 • Published Mar 5 • 40
The Era of 1-bit LLMs: All Large Language Models are in 1.58 Bits Paper • 2402.17764 • Published Feb 27 • 566
BASE TTS: Lessons from building a billion-parameter Text-to-Speech model on 100K hours of data Paper • 2402.08093 • Published Feb 12 • 52
World Model on Million-Length Video And Language With RingAttention Paper • 2402.08268 • Published Feb 13 • 33
LGM: Large Multi-View Gaussian Model for High-Resolution 3D Content Creation Paper • 2402.05054 • Published Feb 7 • 24
BiLLM: Pushing the Limit of Post-Training Quantization for LLMs Paper • 2402.04291 • Published Feb 6 • 48
Qwen1.5 Collection Qwen1.5 is the improved version of Qwen, the large language model series developed by Alibaba Cloud. • 55 items • Updated 10 days ago • 173
DeepSeekMath: Pushing the Limits of Mathematical Reasoning in Open Language Models Paper • 2402.03300 • Published Feb 5 • 61
EVA-GAN: Enhanced Various Audio Generation via Scalable Generative Adversarial Networks Paper • 2402.00892 • Published Jan 31 • 9
OWSM v3.1: Better and Faster Open Whisper-Style Speech Models based on E-Branchformer Paper • 2401.16658 • Published Jan 30 • 12
MoE-LLaVA: Mixture of Experts for Large Vision-Language Models Paper • 2401.15947 • Published Jan 29 • 46
Scaling Up to Excellence: Practicing Model Scaling for Photo-Realistic Image Restoration In the Wild Paper • 2401.13627 • Published Jan 24 • 69
Lumiere: A Space-Time Diffusion Model for Video Generation Paper • 2401.12945 • Published Jan 23 • 82
Rethinking FID: Towards a Better Evaluation Metric for Image Generation Paper • 2401.09603 • Published Nov 30, 2023 • 13
City-on-Web: Real-time Neural Rendering of Large-scale Scenes on the Web Paper • 2312.16457 • Published Dec 27, 2023 • 13
StreamDiffusion: A Pipeline-level Solution for Real-time Interactive Generation Paper • 2312.12491 • Published Dec 19, 2023 • 66
StarVector: Generating Scalable Vector Graphics Code from Images Paper • 2312.11556 • Published Dec 17, 2023 • 26
LLM in a flash: Efficient Large Language Model Inference with Limited Memory Paper • 2312.11514 • Published Dec 12, 2023 • 253
PhotoMaker: Customizing Realistic Human Photos via Stacked ID Embedding Paper • 2312.04461 • Published Dec 7, 2023 • 48
Gaussian Head Avatar: Ultra High-fidelity Head Avatar via Dynamic Gaussians Paper • 2312.03029 • Published Dec 5, 2023 • 22
OneLLM: One Framework to Align All Modalities with Language Paper • 2312.03700 • Published Dec 6, 2023 • 20
Cache Me if You Can: Accelerating Diffusion Models through Block Caching Paper • 2312.03209 • Published Dec 6, 2023 • 16
MotionCtrl: A Unified and Flexible Motion Controller for Video Generation Paper • 2312.03641 • Published Dec 6, 2023 • 19
HiFi4G: High-Fidelity Human Performance Rendering via Compact Gaussian Splatting Paper • 2312.03461 • Published Dec 6, 2023 • 15