Models
Datasets
Spaces
Posts
Docs
Enterprise
Pricing
Log In
Sign Up

Collections

Discover the best community collections!

Collections including paper arxiv:2407.15595

Papers - Flow Matching

Movie Gen: A Cast of Media Foundation Models

Paper • 2410.13720 • Published Oct 17 • 89
F5-TTS: A Fairytaler that Fakes Fluent and Faithful Speech with Flow Matching

Paper • 2410.06885 • Published Oct 9 • 42
Flow Matching for Generative Modeling

Paper • 2210.02747 • Published Oct 6, 2022 • 1
Matcha-TTS: A fast TTS architecture with conditional flow matching

Paper • 2309.03199 • Published Sep 6, 2023 • 11

Perception and abstraction. Each modality is tokenized and embedded into vectors for model to comprehend.

VILA^2: VILA Augmented VILA

Paper • 2407.17453 • Published Jul 24 • 39
Octopus v4: Graph of language models

Paper • 2404.19296 • Published Apr 30 • 116
Octo-planner: On-device Language Model for Planner-Action Agents

Paper • 2406.18082 • Published Jun 26 • 47
Dolphin: Long Context as a New Modality for Energy-Efficient On-Device Language Models

Paper • 2408.15518 • Published Aug 28 • 42

VideoGameBunny: Towards vision assistants for video games

Paper • 2407.15295 • Published Jul 21 • 21
MusiConGen: Rhythm and Chord Control for Transformer-Based Text-to-Music Generation

Paper • 2407.15060 • Published Jul 21 • 9
Discrete Flow Matching

Paper • 2407.15595 • Published Jul 22 • 12

Making Flow-Matching-Based Zero-Shot Text-to-Speech Laugh as You Like

Paper • 2402.07383 • Published Feb 12 • 13
Matcha-TTS: A fast TTS architecture with conditional flow matching

Paper • 2309.03199 • Published Sep 6, 2023 • 11
Natural language guidance of high-fidelity text-to-speech with synthetic annotations

Paper • 2402.01912 • Published Feb 2 • 11
Fast Timing-Conditioned Latent Audio Diffusion

Paper • 2402.04825 • Published Feb 7 • 7

Compose and Conquer: Diffusion-Based 3D Depth Aware Composable Image Synthesis

Paper • 2401.09048 • Published Jan 17 • 9
Improving fine-grained understanding in image-text pre-training

Paper • 2401.09865 • Published Jan 18 • 16
Depth Anything: Unleashing the Power of Large-Scale Unlabeled Data

Paper • 2401.10891 • Published Jan 19 • 60
Scaling Up to Excellence: Practicing Model Scaling for Photo-Realistic Image Restoration In the Wild

Paper • 2401.13627 • Published Jan 24 • 73

Emu Video: Factorizing Text-to-Video Generation by Explicit Image Conditioning

Paper • 2311.10709 • Published Nov 17, 2023 • 24
Face Adapter for Pre-Trained Diffusion Models with Fine-Grained ID and Attribute Control

Paper • 2405.12970 • Published May 21 • 22
FIFO-Diffusion: Generating Infinite Videos from Text without Training

Paper • 2405.11473 • Published May 19 • 53
stabilityai/stable-diffusion-3-medium

Text-to-Image • Updated Aug 12 • 29.1k • 4.64k

Company

TOS Privacy About Jobs

Website

Models Datasets Spaces Pricing Docs