Submitted by akhaliq 582 The Era of 1-bit LLMs: All Large Language Models are in 1.58 Bits · 10 authors 140
Submitted by akhaliq 184 EMO: Emote Portrait Alive - Generating Expressive Portrait Videos with Audio2Video Diffusion Model under Weak Conditions · 4 authors 20
Submitted by akhaliq 88 Sora: A Review on Background, Technology, Limitations, and Opportunities of Large Vision Models · 12 authors 5
Submitted by akhaliq 23 When Scaling Meets LLM Finetuning: The Effect of Data, Model and Finetuning Method · 4 authors 3
Submitted by akhaliq 21 DiffuseKronA: A Parameter Efficient Fine-tuning Method for Personalized Diffusion Model · 6 authors 1
Submitted by akhaliq 21 OmniACT: A Dataset and Benchmark for Enabling Multimodal Generalist Autonomous Agents for Desktop and Web · 7 authors 6
Submitted by akhaliq 16 Seeing and Hearing: Open-domain Visual-Audio Generation with Diffusion Latent Aligners · 5 authors 1
Submitted by akhaliq 10 Playground v2.5: Three Insights towards Enhancing Aesthetic Quality in Text-to-Image Generation · 6 authors 1
Submitted by akhaliq 9 VastGaussian: Vast 3D Gaussians for Large Scene Reconstruction · 11 authors 45