Art Atk

ArtAtk

AI & ML interests

Multimodal Models

Recent Activity

upvoted a paper 2 days ago

DiT-Air: Revisiting the Efficiency of Diffusion Model Architecture Design in Text to Image Generation

upvoted a paper 2 days ago

4D LangSplat: 4D Language Gaussian Splatting via Multimodal Large Language Models

upvoted a paper 2 days ago

SANA-Sprint: One-Step Diffusion with Continuous-Time Consistency Distillation

View all activity

Organizations

None yet

ArtAtk's activity

upvoted 4 papers 2 days ago

DiT-Air: Revisiting the Efficiency of Diffusion Model Architecture Design in Text to Image Generation

Paper • 2503.10618 • Published 3 days ago • 15

4D LangSplat: 4D Language Gaussian Splatting via Multimodal Large Language Models

Paper • 2503.10437 • Published 3 days ago • 22

SANA-Sprint: One-Step Diffusion with Continuous-Time Consistency Distillation

Paper • 2503.09641 • Published 4 days ago • 13

Transformers without Normalization

Paper • 2503.10622 • Published 3 days ago • 81

upvoted a paper 3 days ago

MagicInfinite: Generating Infinite Talking Videos with Your Words and Voice

Paper • 2503.05978 • Published 8 days ago • 32

upvoted a paper 18 days ago

SpargeAttn: Accurate Sparse Attention Accelerating Any Model Inference

Paper • 2502.18137 • Published 19 days ago • 53

upvoted 2 papers 19 days ago

Make LoRA Great Again: Boosting LoRA with Adaptive Singular Values and Mixture-of-Experts Optimization Alignment

Paper • 2502.16894 • Published 20 days ago • 27

Audio-FLAN: A Preliminary Release

Paper • 2502.16584 • Published 21 days ago • 34

liked a Space 21 days ago

1.2k

Wan2.1

💻

Wan: Open and Advanced Large-Scale Video Generative Models

upvoted a paper 22 days ago

Dynamic Concepts Personalization from Single Videos

Paper • 2502.14844 • Published 24 days ago • 16

upvoted a paper 28 days ago

InfiniteHiP: Extending Language Model Context Up to 3 Million Tokens on a Single GPU

Paper • 2502.08910 • Published Feb 13 • 143

upvoted 7 papers about 1 month ago

FlashVideo:Flowing Fidelity to Detail for Efficient High-Resolution Video Generation

Paper • 2502.05179 • Published Feb 7 • 24

Llasa: Scaling Train-Time and Inference-Time Compute for Llama-based Speech Synthesis

Paper • 2502.04128 • Published Feb 6 • 25

VideoJAM: Joint Appearance-Motion Representations for Enhanced Motion Generation in Video Models

Paper • 2502.02492 • Published Feb 4 • 62

OmniHuman-1: Rethinking the Scaling-Up of One-Stage Conditioned Human Animation Models

Paper • 2502.01061 • Published Feb 3 • 190

upvoted 2 papers about 2 months ago

DiffuEraser: A Diffusion Model for Video Inpainting

Paper • 2501.10018 • Published Jan 17 • 14

Temporal Preference Optimization for Long-Form Video Understanding

Paper • 2501.13919 • Published Jan 23 • 22