ArtAtk (Art Atk)

upvoted 4 papers 1 day ago

Glyph-ByT5-v2: A Strong Aesthetic Baseline for Accurate Multilingual Visual Text Rendering

Paper • 2406.10208 • Published 5 days ago • 18

upvoted 4 papers 5 days ago

Visual Sketchpad: Sketching as a Visual Chain of Thought for Multimodal Language Models

Paper • 2406.09403 • Published 6 days ago • 17

DiTFastAttn: Attention Compression for Diffusion Transformer Models

Paper • 2406.08552 • Published 7 days ago • 19

Interpreting the Weight Space of Customized Diffusion Models

Paper • 2406.09413 • Published 6 days ago • 17

Alleviating Distortion in Image Generation via Multi-Resolution Diffusion Models

Paper • 2406.09416 • Published 6 days ago • 26

upvoted 3 papers 6 days ago

Turbo Sparse: Achieving LLM SOTA Performance with Minimal Activated Parameters

Paper • 2406.05955 • Published 9 days ago • 21

PowerInfer-2: Fast Large Language Model Inference on a Smartphone

Paper • 2406.06282 • Published 9 days ago • 33

MotionClone: Training-Free Motion Cloning for Controllable Video Generation

Paper • 2406.05338 • Published 11 days ago • 38

upvoted 4 papers 7 days ago

AsyncDiff: Parallelizing Diffusion Models by Asynchronous Denoising

Paper • 2406.06911 • Published 8 days ago • 10

An Image is Worth 32 Tokens for Reconstruction and Generation

Paper • 2406.07550 • Published 8 days ago • 49

Zero-shot Image Editing with Reference Imitation

Paper • 2406.07547 • Published 8 days ago • 29

4Real: Towards Photorealistic 4D Scene Generation via Video Diffusion Models

Paper • 2406.07472 • Published 8 days ago • 9

upvoted 3 papers 8 days ago

VALL-E 2: Neural Codec Language Models are Human Parity Zero-Shot Text to Speech Synthesizers

Paper • 2406.05370 • Published 11 days ago • 11

Lighting Every Darkness with 3DGS: Fast Training and Real-Time Rendering for HDR View Synthesis

Paper • 2406.06216 • Published 9 days ago • 14

Vript: A Video Is Worth Thousands of Words

Paper • 2406.06040 • Published 9 days ago • 19

upvoted a paper 9 days ago

Proofread: Fixes All Errors with One Tap

Paper • 2406.04523 • Published 12 days ago • 10

upvoted 3 papers 12 days ago

SF-V: Single Forward Video Generation Model

Paper • 2406.04324 • Published 13 days ago • 20

BitsFusion: 1.99 bits Weight Quantization of Diffusion Model

Paper • 2406.04333 • Published 13 days ago • 36

Step-aware Preference Optimization: Aligning Preference with Denoising Performance at Each Step

Paper • 2406.04314 • Published 13 days ago • 26

upvoted 2 papers 16 days ago

4Diffusion: Multi-view Video Diffusion Model for 4D Generation

Paper • 2405.20674 • Published 19 days ago • 9

Transformers are SSMs: Generalized Models and Efficient Algorithms Through Structured State Space Duality

Paper • 2405.21060 • Published 19 days ago • 60

upvoted 2 papers 18 days ago

EasyAnimate: A High-Performance Long Video Generation Method based on Transformer Architecture

Paper • 2405.18991 • Published 21 days ago • 12

T2V-Turbo: Breaking the Quality Bottleneck of Video Consistency Model with Mixed Reward Feedback

Paper • 2405.18750 • Published 21 days ago • 20

upvoted 2 papers 21 days ago

2BP: 2-Stage Backpropagation

Paper • 2405.18047 • Published 22 days ago • 21

Phased Consistency Model

Paper • 2405.18407 • Published 22 days ago • 43

upvoted 4 papers 22 days ago

Looking Backward: Streaming Video-to-Video Translation with Feature Banks

Paper • 2405.15757 • Published 26 days ago • 14

Trans-LoRA: towards data-free Transferable Parameter Efficient Finetuning

Paper • 2405.17258 • Published 23 days ago • 12

Vidu4D: Single Generated Video to High-Fidelity 4D Reconstruction with Dynamic Gaussian Surfels

Paper • 2405.16822 • Published 23 days ago • 11

An Introduction to Vision-Language Modeling

Paper • 2405.17247 • Published 23 days ago • 75

upvoted a paper 26 days ago

Thermodynamic Natural Gradient Descent

Paper • 2405.13817 • Published 28 days ago • 13

upvoted a paper 27 days ago

Your Transformer is Secretly Linear

Paper • 2405.12250 • Published about 1 month ago • 141

upvoted 3 papers 29 days ago

Imp: Highly Capable Large Multimodal Models for Mobile Devices

Paper • 2405.12107 • Published 30 days ago • 23

Dreamer XL: Towards High-Resolution Text-to-3D Generation via Trajectory Score Matching

Paper • 2405.11252 • Published May 18 • 11

FIFO-Diffusion: Generating Infinite Videos from Text without Training

Paper • 2405.11473 • Published about 1 month ago • 53

upvoted a paper about 1 month ago

Chameleon: Mixed-Modal Early-Fusion Foundation Models

Paper • 2405.09818 • Published May 16 • 104

upvoted 17 papers about 2 months ago

StoryDiffusion: Consistent Self-Attention for Long-Range Image and Video Generation

Paper • 2405.01434 • Published May 2 • 49

Paint by Inpaint: Learning to Add Image Objects by Removing Them First

Paper • 2404.18212 • Published Apr 28 • 25

Visual Fact Checker: Enabling High-Fidelity Detailed Caption Generation

Paper • 2404.19752 • Published Apr 30 • 20

Iterative Reasoning Preference Optimization

Paper • 2404.19733 • Published Apr 30 • 44

Better & Faster Large Language Models via Multi-token Prediction

Paper • 2404.19737 • Published Apr 30 • 64

InstantFamily: Masked Attention for Zero-shot Multi-ID Image Generation

Paper • 2404.19427 • Published Apr 30 • 69

ID-Aligner: Enhancing Identity-Preserving Text-to-Image Generation with Reward Feedback Learning

Paper • 2404.15449 • Published Apr 23 • 11

PuLID: Pure and Lightning ID Customization via Contrastive Alignment

Paper • 2404.16022 • Published Apr 24 • 16

NeRF-XL: Scaling NeRFs with Multiple GPUs

Paper • 2404.16221 • Published Apr 24 • 11

Make Your LLM Fully Utilize the Context

Paper • 2404.16811 • Published Apr 25 • 52

Layer Skip: Enabling Early Exit Inference and Self-Speculative Decoding

Paper • 2404.16710 • Published Apr 25 • 56

ConsistentID: Portrait Generation with Multimodal Fine-Grained Identity Preserving

Paper • 2404.16771 • Published Apr 25 • 16

FlashSpeech: Efficient Zero-Shot Speech Synthesis

Paper • 2404.14700 • Published Apr 23 • 29

SnapKV: LLM Knows What You are Looking for Before Generation

Paper • 2404.14469 • Published Apr 22 • 23

The Instruction Hierarchy: Training LLMs to Prioritize Privileged Instructions

Paper • 2404.13208 • Published Apr 19 • 38

Music Consistency Models

Paper • 2404.13358 • Published Apr 20 • 12

Phi-3 Technical Report: A Highly Capable Language Model Locally on Your Phone

Paper • 2404.14219 • Published Apr 22 • 239

upvoted 5 papers 2 months ago

JetMoE: Reaching Llama2 Performance with 0.1M Dollars

Paper • 2404.07413 • Published Apr 11 • 32

Applying Guidance in a Limited Interval Improves Sample and Distribution Quality in Diffusion Models

Paper • 2404.07724 • Published Apr 11 • 10

Ferret-v2: An Improved Baseline for Referring and Grounding with Large Language Models

Paper • 2404.07973 • Published Apr 11 • 28

Rho-1: Not All Tokens Are What You Need

Paper • 2404.07965 • Published Apr 11 • 80

Leave No Context Behind: Efficient Infinite Context Transformers with Infini-attention

Paper • 2404.07143 • Published Apr 10 • 97

Art Atk

AI & ML interests

Organizations

ArtAtk's activity