Hyperstroke: A Novel High-quality Stroke Representation for Assistive Artistic Drawing Paper • 2408.09348 • Published Aug 18, 2024 • 1
MixDQ: Memory-Efficient Few-Step Text-to-Image Diffusion Models with Metric-Decoupled Mixed Precision Quantization Paper • 2405.17873 • Published May 28, 2024 • 2
ViDiT-Q: Efficient and Accurate Quantization of Diffusion Transformers for Image and Video Generation Paper • 2406.02540 • Published Jun 4, 2024 • 2
E-CAR: Efficient Continuous Autoregressive Image Generation via Multistage Modeling Paper • 2412.14170 • Published Dec 18, 2024
Preference Leakage: A Contamination Problem in LLM-as-a-judge Paper • 2502.01534 • Published Feb 3 • 39
PhenDiff: Revealing Invisible Phenotypes with Conditional Diffusion Models Paper • 2312.08290 • Published Dec 13, 2023 • 2
World-consistent Video Diffusion with Explicit 3D Modeling Paper • 2412.01821 • Published Dec 2, 2024 • 4
Pathways on the Image Manifold: Image Editing via Video Generation Paper • 2411.16819 • Published Nov 25, 2024 • 33
Fake it to make it: Using synthetic data to remedy the data shortage in joint multimodal speech-and-gesture synthesis Paper • 2404.19622 • Published Apr 30, 2024 • 2
MM-Conv: A Multi-modal Conversational Dataset for Virtual Humans Paper • 2410.00253 • Published Sep 30, 2024
Seeking Neural Nuggets: Knowledge Transfer in Large Language Models from a Parametric Perspective Paper • 2310.11451 • Published Oct 17, 2023
Law of the Weakest Link: Cross Capabilities of Large Language Models Paper • 2409.19951 • Published Sep 30, 2024 • 54
PhysGen: Rigid-Body Physics-Grounded Image-to-Video Generation Paper • 2409.18964 • Published Sep 27, 2024 • 26
Layout-Corrector: Alleviating Layout Sticking Phenomenon in Discrete Diffusion Model Paper • 2409.16689 • Published Sep 25, 2024 • 1
Dolphin: Long Context as a New Modality for Energy-Efficient On-Device Language Models Paper • 2408.15518 • Published Aug 28, 2024 • 43
Smoothed Energy Guidance: Guiding Diffusion Models with Reduced Energy Curvature of Attention Paper • 2408.00760 • Published Aug 1, 2024 • 7
Visual Riddles: a Commonsense and World Knowledge Challenge for Large Vision and Language Models Paper • 2407.19474 • Published Jul 28, 2024 • 23
view post Post 25784 New feature 🔥 Image models and LoRAs now have little previews 🤏If you don't know where to start to find them, I invite you to browse cool LoRAs in the profile of some amazing fine-tuners: @artificialguybr , @alvdansen , @DoctorDiffusion , @e-n-v-y , @KappaNeuro @ostris 3 replies · ❤️ 12 12 🚀 1 1 + Reply
Diffusion-Based Co-Speech Gesture Generation Using Joint Text and Audio Representation Paper • 2309.05455 • Published Sep 11, 2023