Spaces:
Running
on
Zero
Running
on
Zero
File size: 3,182 Bytes
9d0d223 |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 |
# MelodyFlow: High Fidelity Text-Guided Music Editing via Single-Stage Flow Matching
AudioCraft provides the code and models for MelodyFlow, [High Fidelity Text-Guided Music Editing via Single-Stage Flow Matching][arxiv].
MelodyFlow is a text-guided music generation and editing model capable of generating high-quality stereo samples conditioned on text descriptions.
It is a Flow Matching Diffusion Transformer trained over a 48 kHz stereo (resp. 32 kHz mono) quantizer-free EnCodec tokenizer sampled at 25 Hz (resp. 20 Hz).
Unlike prior work on Flow Matching for music generation such as [MusicFlow: Cascaded Flow Matching for Text Guided Music Generation](https://openreview.net/forum?id=kOczKjmYum),
MelodyFlow doesn't require model cascading, which makes it very convenient for music editing.
Check out our [sample page][melodyflow_samples] or test the available demo!
We use 16K hours of licensed music to train MelodyFlow. Specifically, we rely on an internal dataset
of 10K high-quality music tracks, and on the ShutterStock and Pond5 music data.
## Model Card
See [the model card](../model_cards/MELODFYFLOW_MODEL_CARD.md).
## Installation
Please follow the AudioCraft installation instructions from the [README](../README.md).
AudioCraft requires a GPU with at least 16 GB of memory for running inference with the medium-sized models (~1.5B parameters).
## Usage
We currently offer two ways to interact with MAGNeT:
1. You can use the gradio demo locally by running [`python -m demos.melodyflow_app --share`](../demos/melodyflow_app.py).
2. You can play with MelodyFlow by running the jupyter notebook at [`demos/melodyflow_demo.ipynb`](../demos/melodyflow_demo.ipynb) locally (also works on CPU).
## API
We provide a simple API and 1 pre-trained model:
- `facebook/melodyflow-t24-30secs`: 1B model, text to music, generates 30-second samples - [🤗 Hub](https://huggingface.co/facebook/melodyflow-t24-30secs)
See after a quick example for using the API.
```python
import torchaudio
from audiocraft.models import MelodyFlow
from audiocraft.data.audio import audio_write
model = MelodyFlow.get_pretrained('facebook/melodyflow-t24-30secs')
descriptions = ['disco beat', 'energetic EDM', 'funky groove']
wav = model.generate(descriptions) # generates 3 samples.
for idx, one_wav in enumerate(wav):
# Will save under {idx}.wav, with loudness normalization at -14 db LUFS.
audio_write(f'{idx}', one_wav.cpu(), model.sample_rate, strategy="loudness", loudness_compressor=True)
```
## Training
Coming later...
## Citation
```
@misc{lan2024high,
title={High fidelity text-guided music generation and editing via single-stage flow matching},
author={Le Lan, Gael and Shi, Bowen and Ni, Zhaoheng and Srinivasan, Sidd and Kumar, Anurag and Ellis, Brian and Kant, David and Nagaraja, Varun and Chang, Ernie and Hsu, Wei-Ning and others},
year={2024},
eprint={2407.03648},
archivePrefix={arXiv},
primaryClass={cs.SD}
}
```
## License
See license information in the [model card](../model_cards/MELODFYFLOW_MODEL_CARD.md).
[arxiv]: https://arxiv.org/pdf/2407.03648
[magnet_samples]: https://melodyflow.github.io/
|