Moshi v0.1 Release Collection MLX, Candle & PyTorch model checkpoints released as part of the Moshi release from Kyutai. Run inference via: https://github.com/kyutai-labs/moshi • 13 items • Updated 15 days ago • 197
LongRoPE: Extending LLM Context Window Beyond 2 Million Tokens Paper • 2402.13753 • Published Feb 21 • 111
ChatAnything: Facetime Chat with LLM-Enhanced Personas Paper • 2311.06772 • Published Nov 12, 2023 • 34
Music ControlNet: Multiple Time-varying Controls for Music Generation Paper • 2311.07069 • Published Nov 13, 2023 • 43
Q-Instruct: Improving Low-level Visual Abilities for Multi-modality Foundation Models Paper • 2311.06783 • Published Nov 12, 2023 • 26
I2VGen-XL: High-Quality Image-to-Video Synthesis via Cascaded Diffusion Models Paper • 2311.04145 • Published Nov 7, 2023 • 32
Battle of the Backbones: A Large-Scale Comparison of Pretrained Models across Computer Vision Tasks Paper • 2310.19909 • Published Oct 30, 2023 • 20
VideoCrafter1: Open Diffusion Models for High-Quality Video Generation Paper • 2310.19512 • Published Oct 30, 2023 • 15
MM-VID: Advancing Video Understanding with GPT-4V(ision) Paper • 2310.19773 • Published Oct 30, 2023 • 19
CodeFusion: A Pre-trained Diffusion Model for Code Generation Paper • 2310.17680 • Published Oct 26, 2023 • 69
Wonder3D: Single Image to 3D using Cross-Domain Diffusion Paper • 2310.15008 • Published Oct 23, 2023 • 21
LLaVA-Interactive: An All-in-One Demo for Image Chat, Segmentation, Generation and Editing Paper • 2311.00571 • Published Nov 1, 2023 • 40
Distil-Whisper: Robust Knowledge Distillation via Large-Scale Pseudo Labelling Paper • 2311.00430 • Published Nov 1, 2023 • 56
A Picture is Worth a Thousand Words: Principled Recaptioning Improves Image Generation Paper • 2310.16656 • Published Oct 25, 2023 • 39
DreamCraft3D: Hierarchical 3D Generation with Bootstrapped Diffusion Prior Paper • 2310.16818 • Published Oct 25, 2023 • 30
DPM-Solver-v3: Improved Diffusion ODE Solver with Empirical Model Statistics Paper • 2310.13268 • Published Oct 20, 2023 • 17
PaLI-3 Vision Language Models: Smaller, Faster, Stronger Paper • 2310.09199 • Published Oct 13, 2023 • 24
MiniGPT-v2: large language model as a unified interface for vision-language multi-task learning Paper • 2310.09478 • Published Oct 14, 2023 • 19
BitNet: Scaling 1-bit Transformers for Large Language Models Paper • 2310.11453 • Published Oct 17, 2023 • 96
Kandinsky: an Improved Text-to-Image Synthesis with Image Prior and Latent Diffusion Paper • 2310.03502 • Published Oct 5, 2023 • 77
Aligning Text-to-Image Diffusion Models with Reward Backpropagation Paper • 2310.03739 • Published Oct 5, 2023 • 21
PixArt-α: Fast Training of Diffusion Transformer for Photorealistic Text-to-Image Synthesis Paper • 2310.00426 • Published Sep 30, 2023 • 61
DreamGaussian: Generative Gaussian Splatting for Efficient 3D Content Creation Paper • 2309.16653 • Published Sep 28, 2023 • 45
Show-1: Marrying Pixel and Latent Diffusion Models for Text-to-Video Generation Paper • 2309.15818 • Published Sep 27, 2023 • 18
Emu: Enhancing Image Generation Models Using Photogenic Needles in a Haystack Paper • 2309.15807 • Published Sep 27, 2023 • 32
LongLoRA: Efficient Fine-tuning of Long-Context Large Language Models Paper • 2309.12307 • Published Sep 21, 2023 • 86
LAVIE: High-Quality Video Generation with Cascaded Latent Diffusion Models Paper • 2309.15103 • Published Sep 26, 2023 • 42
Multimodal Foundation Models: From Specialists to General-Purpose Assistants Paper • 2309.10020 • Published Sep 18, 2023 • 40
DreamLLM: Synergistic Multimodal Comprehension and Creation Paper • 2309.11499 • Published Sep 20, 2023 • 58
PhotoVerse: Tuning-Free Image Customization with Text-to-Image Diffusion Models Paper • 2309.05793 • Published Sep 11, 2023 • 50
InstaFlow: One Step is Enough for High-Quality Diffusion-Based Text-to-Image Generation Paper • 2309.06380 • Published Sep 12, 2023 • 32
ProPainter: Improving Propagation and Transformer for Video Inpainting Paper • 2309.03897 • Published Sep 7, 2023 • 26
Textbooks Are All You Need II: phi-1.5 technical report Paper • 2309.05463 • Published Sep 11, 2023 • 86
Scaling Autoregressive Multi-Modal Models: Pretraining and Instruction Tuning Paper • 2309.02591 • Published Sep 5, 2023 • 14
Nougat: Neural Optical Understanding for Academic Documents Paper • 2308.13418 • Published Aug 25, 2023 • 35
Patch n' Pack: NaViT, a Vision Transformer for any Aspect Ratio and Resolution Paper • 2307.06304 • Published Jul 12, 2023 • 26
CoDeF: Content Deformation Fields for Temporally Consistent Video Processing Paper • 2308.07926 • Published Aug 15, 2023 • 27
3D Gaussian Splatting for Real-Time Radiance Field Rendering Paper • 2308.04079 • Published Aug 8, 2023 • 167
AudioLDM 2: Learning Holistic Audio Generation with Self-supervised Pretraining Paper • 2308.05734 • Published Aug 10, 2023 • 36
Platypus: Quick, Cheap, and Powerful Refinement of LLMs Paper • 2308.07317 • Published Aug 14, 2023 • 23
OpenFlamingo: An Open-Source Framework for Training Large Autoregressive Vision-Language Models Paper • 2308.01390 • Published Aug 2, 2023 • 31
ConceptLab: Creative Generation using Diffusion Prior Constraints Paper • 2308.02669 • Published Aug 3, 2023 • 23
Diffusion Sampling with Momentum for Mitigating Divergence Artifacts Paper • 2307.11118 • Published Jul 20, 2023 • 7
Optimized Network Architectures for Large Language Model Training with Billions of Parameters Paper • 2307.12169 • Published Jul 22, 2023 • 9