Collections
Discover the best community collections!
Collections including paper arxiv:2401.02957
-
GenTron: Delving Deep into Diffusion Transformers for Image and Video Generation
Paper • 2312.04557 • Published • 12 -
Smooth Diffusion: Crafting Smooth Latent Spaces in Diffusion Models
Paper • 2312.04410 • Published • 14 -
PhotoMaker: Customizing Realistic Human Photos via Stacked ID Embedding
Paper • 2312.04461 • Published • 49 -
Open-Vocabulary SAM: Segment and Recognize Twenty-thousand Classes Interactively
Paper • 2401.02955 • Published • 16
-
Learning Vision from Models Rivals Learning Vision from Data
Paper • 2312.17742 • Published • 12 -
Unsupervised Universal Image Segmentation
Paper • 2312.17243 • Published • 18 -
Perspectives on the State and Future of Deep Learning -- 2023
Paper • 2312.09323 • Published • 5 -
Vision-Language Models as a Source of Rewards
Paper • 2312.09187 • Published • 10
-
SwitchHead: Accelerating Transformers with Mixture-of-Experts Attention
Paper • 2312.07987 • Published • 39 -
Interfacing Foundation Models' Embeddings
Paper • 2312.07532 • Published • 10 -
Point Transformer V3: Simpler, Faster, Stronger
Paper • 2312.10035 • Published • 17 -
TheBloke/quantum-v0.01-GPTQ
Text Generation • Updated • 2 • 2
-
Exponentially Faster Language Modelling
Paper • 2311.10770 • Published • 117 -
stabilityai/stable-video-diffusion-img2vid-xt
Image-to-Video • Updated • 127k • 2.27k -
LucidDreamer: Domain-free Generation of 3D Gaussian Splatting Scenes
Paper • 2311.13384 • Published • 48 -
HierSpeech++: Bridging the Gap between Semantic and Acoustic Representation of Speech by Hierarchical Variational Inference for Zero-shot Speech Synthesis
Paper • 2311.12454 • Published • 27
-
Parameter-Efficient Orthogonal Finetuning via Butterfly Factorization
Paper • 2311.06243 • Published • 17 -
FlashFFTConv: Efficient Convolutions for Long Sequences with Tensor Cores
Paper • 2311.05908 • Published • 11 -
PolyMaX: General Dense Prediction with Mask Transformer
Paper • 2311.05770 • Published • 6 -
SPHINX: The Joint Mixing of Weights, Tasks, and Visual Embeddings for Multi-modal Large Language Models
Paper • 2311.07575 • Published • 10
-
Any-Size-Diffusion: Toward Efficient Text-Driven Synthesis for Any-Size HD Images
Paper • 2308.16582 • Published • 10 -
DreamSpace: Dreaming Your Room Space with Text-Driven Panoramic Texture Propagation
Paper • 2310.13119 • Published • 10 -
DreamCraft3D: Hierarchical 3D Generation with Bootstrapped Diffusion Prior
Paper • 2310.16818 • Published • 27 -
Text-to-3D with classifier score distillation
Paper • 2310.19415 • Published • 4