Surn commited on
Commit
fef074d
·
1 Parent(s): 2c70760

6 New Models

Browse files

large melody
stereo-*

Fix Melody default

Allow Multiple Melody models to be interactively set

README.md CHANGED
@@ -1,209 +1,215 @@
1
- ---
2
- title: UnlimitedMusicGen
3
- emoji: 🎼
4
- colorFrom: white
5
- colorTo: red
6
- sdk: gradio
7
- sdk_version: 3.38.0
8
- app_file: app.py
9
- pinned: false
10
- license: creativeml-openrail-m
11
- tags:
12
- - musicgen
13
- - unlimited
14
- ---
15
-
16
- [arxiv]: https://arxiv.org/abs/2306.05284
17
- [musicgen_samples]: https://ai.honu.io/papers/musicgen/
18
- Check out the configuration reference at https://huggingface.co/docs/hub/spaces-config-reference
19
-
20
- # UnlimitedMusicGen
21
- This is my modification of the Audiocraft project to enable unlimited Audio generation. I have added a few features to the original project to enable this. I have also added a few features to the gradio interface to make it easier to use.
22
-
23
- # Audiocraft
24
- ![docs badge](https://github.com/facebookresearch/audiocraft/workflows/audiocraft_docs/badge.svg)
25
- ![linter badge](https://github.com/facebookresearch/audiocraft/workflows/audiocraft_linter/badge.svg)
26
- ![tests badge](https://github.com/facebookresearch/audiocraft/workflows/audiocraft_tests/badge.svg)
27
-
28
- Audiocraft is a PyTorch library for deep learning research on audio generation. At the moment, it contains the code for MusicGen, a state-of-the-art controllable text-to-music model.
29
-
30
- ## MusicGen
31
-
32
- Audiocraft provides the code and models for MusicGen, [a simple and controllable model for music generation][arxiv]. MusicGen is a single stage auto-regressive
33
- Transformer model trained over a 32kHz <a href="https://github.com/facebookresearch/encodec">EnCodec tokenizer</a> with 4 codebooks sampled at 50 Hz. Unlike existing methods like [MusicLM](https://arxiv.org/abs/2301.11325), MusicGen doesn't require a self-supervised semantic representation, and it generates
34
- all 4 codebooks in one pass. By introducing a small delay between the codebooks, we show we can predict
35
- them in parallel, thus having only 50 auto-regressive steps per second of audio.
36
- Check out our [sample page][musicgen_samples] or test the available demo!
37
-
38
- <a target="_blank" href="https://colab.research.google.com/drive/1-Xe9NCdIs2sCUbiSmwHXozK6AAhMm7_i?usp=sharing">
39
- <img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/>
40
- </a>
41
- <a target="_blank" href="https://huggingface.co/spaces/facebook/MusicGen">
42
- <img src="https://huggingface.co/datasets/huggingface/badges/raw/main/open-in-hf-spaces-sm.svg" alt="Open in HugginFace"/>
43
- </a>
44
- <br>
45
-
46
- We use 20K hours of licensed music to train MusicGen. Specifically, we rely on an internal dataset of 10K high-quality music tracks, and on the ShutterStock and Pond5 music data.
47
-
48
- ## Installation
49
- Audiocraft requires Python 3.9, PyTorch 2.0.0, and a GPU with at least 16 GB of memory (for the medium-sized model). To install Audiocraft, you can run the following:
50
-
51
- ```shell
52
- # Best to make sure you have torch installed first, in particular before installing xformers.
53
- # Don't run this if you already have PyTorch installed.
54
- pip install 'torch>=2.0'
55
- # Then proceed to one of the following
56
- pip install -U audiocraft # stable release
57
- pip install -U git+https://git@github.com/facebookresearch/audiocraft#egg=audiocraft # bleeding edge
58
- pip install -e . # or if you cloned the repo locally
59
- ```
60
-
61
- ## Usage
62
- We offer a number of way to interact with MusicGen:
63
- 1. A demo is also available on the [`facebook/MusicGen` HuggingFace Space](https://huggingface.co/spaces/facebook/MusicGen) (huge thanks to all the HF team for their support).
64
- 2. You can run the Gradio demo in Colab: [colab notebook](https://colab.research.google.com/drive/1-Xe9NCdIs2sCUbiSmwHXozK6AAhMm7_i?usp=sharing).
65
- 3. You can use the gradio demo locally by running `python app.py`.
66
- 4. You can play with MusicGen by running the jupyter notebook at [`demo.ipynb`](./demo.ipynb) locally (if you have a GPU).
67
- 5. Checkout [@camenduru Colab page](https://github.com/camenduru/MusicGen-colab) which is regularly
68
- updated with contributions from @camenduru and the community.
69
- 6. Finally, MusicGen is available in 🤗 Transformers from v4.31.0 onwards, see section [🤗 Transformers Usage](#-transformers-usage) below.
70
-
71
- ### More info about Top-k, Top-p, Temperature and Classifier Free Guidance from ChatGPT
72
- 6. Finally, MusicGen is available in 🤗 Transformers from v4.31.0 onwards, see section [🤗 Transformers Usage](#-transformers-usage) below.
73
-
74
-
75
- Top-k: Top-k is a parameter used in text generation models, including music generation models. It determines the number of most likely next tokens to consider at each step of the generation process. The model ranks all possible tokens based on their predicted probabilities, and then selects the top-k tokens from the ranked list. The model then samples from this reduced set of tokens to determine the next token in the generated sequence. A smaller value of k results in a more focused and deterministic output, while a larger value of k allows for more diversity in the generated music.
76
-
77
- Top-p (or nucleus sampling): Top-p, also known as nucleus sampling or probabilistic sampling, is another method used for token selection during text generation. Instead of specifying a fixed number like top-k, top-p considers the cumulative probability distribution of the ranked tokens. It selects the smallest possible set of tokens whose cumulative probability exceeds a certain threshold (usually denoted as p). The model then samples from this set to choose the next token. This approach ensures that the generated output maintains a balance between diversity and coherence, as it allows for a varying number of tokens to be considered based on their probabilities.
78
-
79
- Temperature: Temperature is a parameter that controls the randomness of the generated output. It is applied during the sampling process, where a higher temperature value results in more random and diverse outputs, while a lower temperature value leads to more deterministic and focused outputs. In the context of music generation, a higher temperature can introduce more variability and creativity into the generated music, but it may also lead to less coherent or structured compositions. On the other hand, a lower temperature can produce more repetitive and predictable music.
80
-
81
- Classifier-Free Guidance: Classifier-Free Guidance refers to a technique used in some music generation models where a separate classifier network is trained to provide guidance or control over the generated music. This classifier is trained on labeled data to recognize specific musical characteristics or styles. During the generation process, the output of the generator model is evaluated by the classifier, and the generator is encouraged to produce music that aligns with the desired characteristics or style. This approach allows for more fine-grained control over the generated music, enabling users to specify certain attributes they want the model to capture.
82
-
83
- These parameters, such as top-k, top-p, temperature, and classifier-free guidance, provide different ways to influence the output of a music generation model and strike a balance between creativity, diversity, coherence, and control. The specific values for these parameters can be tuned based on the desired outcome and user preferences.
84
-
85
- ## API
86
-
87
- We provide a simple API and 4 pre-trained models. The pre trained models are:
88
- - `small`: 300M model, text to music only - [🤗 Hub](https://huggingface.co/facebook/musicgen-small)
89
- - `medium`: 1.5B model, text to music only - [🤗 Hub](https://huggingface.co/facebook/musicgen-medium)
90
- - `melody`: 1.5B model, text to music and text+melody to music - [🤗 Hub](https://huggingface.co/facebook/musicgen-melody)
91
- - `large`: 3.3B model, text to music only - [🤗 Hub](https://huggingface.co/facebook/musicgen-large)
92
-
93
- We observe the best trade-off between quality and compute with the `medium` or `melody` model.
94
- In order to use MusicGen locally **you must have a GPU**. We recommend 16GB of memory, but smaller
95
- GPUs will be able to generate short sequences, or longer sequences with the `small` model.
96
-
97
- **Note**: Please make sure to have [ffmpeg](https://ffmpeg.org/download.html) installed when using newer version of `torchaudio`.
98
- You can install it with:
99
- ```
100
- apt-get install ffmpeg
101
- ```
102
-
103
- See after a quick example for using the API.
104
-
105
- ```python
106
- import torchaudio
107
- from audiocraft.models import MusicGen
108
- from audiocraft.data.audio import audio_write
109
-
110
- model = MusicGen.get_pretrained('melody')
111
- model.set_generation_params(duration=8) # generate 8 seconds.
112
- wav = model.generate_unconditional(4) # generates 4 unconditional audio samples
113
- descriptions = ['happy rock', 'energetic EDM', 'sad jazz']
114
- wav = model.generate(descriptions) # generates 3 samples.
115
-
116
- melody, sr = torchaudio.load('./assets/bach.mp3')
117
- # generates using the melody from the given audio and the provided descriptions.
118
- wav = model.generate_with_chroma(descriptions, melody[None].expand(3, -1, -1), sr)
119
-
120
- for idx, one_wav in enumerate(wav):
121
- # Will save under {idx}.wav, with loudness normalization at -14 db LUFS.
122
- audio_write(f'{idx}', one_wav.cpu(), model.sample_rate, strategy="loudness", loudness_compressor=True)
123
- ```
124
- ## 🤗 Transformers Usage
125
-
126
- MusicGen is available in the 🤗 Transformers library from version 4.31.0 onwards, requiring minimal dependencies
127
- and additional packages. Steps to get started:
128
-
129
- 1. First install the 🤗 [Transformers library](https://github.com/huggingface/transformers) from main:
130
-
131
- ```
132
- pip install git+https://github.com/huggingface/transformers.git
133
- ```
134
-
135
- 2. Run the following Python code to generate text-conditional audio samples:
136
-
137
- ```py
138
- from transformers import AutoProcessor, MusicgenForConditionalGeneration
139
-
140
-
141
- processor = AutoProcessor.from_pretrained("facebook/musicgen-small")
142
- model = MusicgenForConditionalGeneration.from_pretrained("facebook/musicgen-small")
143
-
144
- inputs = processor(
145
- text=["80s pop track with bassy drums and synth", "90s rock song with loud guitars and heavy drums"],
146
- padding=True,
147
- return_tensors="pt",
148
- )
149
-
150
- audio_values = model.generate(**inputs, max_new_tokens=256)
151
- ```
152
-
153
- 3. Listen to the audio samples either in an ipynb notebook:
154
-
155
- ```py
156
- from IPython.display import Audio
157
-
158
- sampling_rate = model.config.audio_encoder.sampling_rate
159
- Audio(audio_values[0].numpy(), rate=sampling_rate)
160
- ```
161
-
162
- Or save them as a `.wav` file using a third-party library, e.g. `scipy`:
163
-
164
- ```py
165
- import scipy
166
-
167
- sampling_rate = model.config.audio_encoder.sampling_rate
168
- scipy.io.wavfile.write("musicgen_out.wav", rate=sampling_rate, data=audio_values[0, 0].numpy())
169
- ```
170
-
171
- For more details on using the MusicGen model for inference using the 🤗 Transformers library, refer to the
172
- [MusicGen docs](https://huggingface.co/docs/transformers/main/en/model_doc/musicgen) or the hands-on
173
- [Google Colab](https://colab.research.google.com/github/sanchit-gandhi/notebooks/blob/main/MusicGen.ipynb).
174
-
175
-
176
- ## Model Card
177
-
178
- See [the model card page](./MODEL_CARD.md).
179
-
180
- ## FAQ
181
-
182
- #### Will the training code be released?
183
-
184
- Yes. We will soon release the training code for MusicGen and EnCodec.
185
-
186
-
187
- #### I need help on Windows
188
-
189
- @FurkanGozukara made a complete tutorial for [Audiocraft/MusicGen on Windows](https://youtu.be/v-YpvPkhdO4)
190
-
191
- #### I need help for running the demo on Colab
192
-
193
- Check [@camenduru tutorial on Youtube](https://www.youtube.com/watch?v=EGfxuTy9Eeo).
194
-
195
- ## Citation
196
- ```
197
- @article{copet2023simple,
198
- title={Simple and Controllable Music Generation},
199
- author={Jade Copet and Felix Kreuk and Itai Gat and Tal Remez and David Kant and Gabriel Synnaeve and Yossi Adi and Alexandre Défossez},
200
- year={2023},
201
- journal={arXiv preprint arXiv:2306.05284},
202
- }
203
- ```
204
-
205
- ## License
206
- * The code in this repository is released under the MIT license as found in the [LICENSE file](LICENSE).
207
- * The weights in this repository are released under the CC-BY-NC 4.0 license as found in the [LICENSE_weights file](LICENSE_weights).
208
- [arxiv]: https://arxiv.org/abs/2306.05284
 
 
 
 
 
 
209
  [musicgen_samples]: https://ai.honu.io/papers/musicgen/
 
1
+ ---
2
+ title: UnlimitedMusicGen
3
+ emoji: 🎼
4
+ colorFrom: gray
5
+ colorTo: red
6
+ sdk: gradio
7
+ sdk_version: 3.38.0
8
+ app_file: app.py
9
+ pinned: false
10
+ license: creativeml-openrail-m
11
+ tags:
12
+ - musicgen
13
+ - unlimited
14
+ ---
15
+
16
+ [arxiv]: https://arxiv.org/abs/2306.05284
17
+ [musicgen_samples]: https://ai.honu.io/papers/musicgen/
18
+ Check out the configuration reference at https://huggingface.co/docs/hub/spaces-config-reference
19
+
20
+ # UnlimitedMusicGen
21
+ This is my modification of the Audiocraft project to enable unlimited Audio generation. I have added a few features to the original project to enable this. I have also added a few features to the gradio interface to make it easier to use.
22
+
23
+ # Audiocraft
24
+ ![docs badge](https://github.com/facebookresearch/audiocraft/workflows/audiocraft_docs/badge.svg)
25
+ ![linter badge](https://github.com/facebookresearch/audiocraft/workflows/audiocraft_linter/badge.svg)
26
+ ![tests badge](https://github.com/facebookresearch/audiocraft/workflows/audiocraft_tests/badge.svg)
27
+
28
+ Audiocraft is a PyTorch library for deep learning research on audio generation. At the moment, it contains the code for MusicGen, a state-of-the-art controllable text-to-music model.
29
+
30
+ ## MusicGen
31
+
32
+ Audiocraft provides the code and models for MusicGen, [a simple and controllable model for music generation][arxiv]. MusicGen is a single stage auto-regressive
33
+ Transformer model trained over a 32kHz <a href="https://github.com/facebookresearch/encodec">EnCodec tokenizer</a> with 4 codebooks sampled at 50 Hz. Unlike existing methods like [MusicLM](https://arxiv.org/abs/2301.11325), MusicGen doesn't require a self-supervised semantic representation, and it generates
34
+ all 4 codebooks in one pass. By introducing a small delay between the codebooks, we show we can predict
35
+ them in parallel, thus having only 50 auto-regressive steps per second of audio.
36
+ Check out our [sample page][musicgen_samples] or test the available demo!
37
+
38
+ <a target="_blank" href="https://colab.research.google.com/drive/1-Xe9NCdIs2sCUbiSmwHXozK6AAhMm7_i?usp=sharing">
39
+ <img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/>
40
+ </a>
41
+ <a target="_blank" href="https://huggingface.co/spaces/facebook/MusicGen">
42
+ <img src="https://huggingface.co/datasets/huggingface/badges/raw/main/open-in-hf-spaces-sm.svg" alt="Open in HugginFace"/>
43
+ </a>
44
+ <br>
45
+
46
+ We use 20K hours of licensed music to train MusicGen. Specifically, we rely on an internal dataset of 10K high-quality music tracks, and on the ShutterStock and Pond5 music data.
47
+
48
+ ## Installation
49
+ Audiocraft requires Python 3.9, PyTorch 2.0.0, and a GPU with at least 16 GB of memory (for the medium-sized model). To install Audiocraft, you can run the following:
50
+
51
+ ```shell
52
+ # Best to make sure you have torch installed first, in particular before installing xformers.
53
+ # Don't run this if you already have PyTorch installed.
54
+ pip install 'torch>=2.0'
55
+ # Then proceed to one of the following
56
+ pip install -U audiocraft # stable release
57
+ pip install -U git+https://git@github.com/facebookresearch/audiocraft#egg=audiocraft # bleeding edge
58
+ pip install -e . # or if you cloned the repo locally
59
+ ```
60
+
61
+ ## Usage
62
+ We offer a number of way to interact with MusicGen:
63
+ 1. A demo is also available on the [`facebook/MusicGen` HuggingFace Space](https://huggingface.co/spaces/facebook/MusicGen) (huge thanks to all the HF team for their support).
64
+ 2. You can run the Gradio demo in Colab: [colab notebook](https://colab.research.google.com/drive/1-Xe9NCdIs2sCUbiSmwHXozK6AAhMm7_i?usp=sharing).
65
+ 3. You can use the gradio demo locally by running `python app.py`.
66
+ 4. You can play with MusicGen by running the jupyter notebook at [`demo.ipynb`](./demo.ipynb) locally (if you have a GPU).
67
+ 5. Checkout [@camenduru Colab page](https://github.com/camenduru/MusicGen-colab) which is regularly
68
+ updated with contributions from @camenduru and the community.
69
+ 6. Finally, MusicGen is available in 🤗 Transformers from v4.31.0 onwards, see section [🤗 Transformers Usage](#-transformers-usage) below.
70
+
71
+ ### More info about Top-k, Top-p, Temperature and Classifier Free Guidance from ChatGPT
72
+ 6. Finally, MusicGen is available in 🤗 Transformers from v4.31.0 onwards, see section [🤗 Transformers Usage](#-transformers-usage) below.
73
+
74
+
75
+ Top-k: Top-k is a parameter used in text generation models, including music generation models. It determines the number of most likely next tokens to consider at each step of the generation process. The model ranks all possible tokens based on their predicted probabilities, and then selects the top-k tokens from the ranked list. The model then samples from this reduced set of tokens to determine the next token in the generated sequence. A smaller value of k results in a more focused and deterministic output, while a larger value of k allows for more diversity in the generated music.
76
+
77
+ Top-p (or nucleus sampling): Top-p, also known as nucleus sampling or probabilistic sampling, is another method used for token selection during text generation. Instead of specifying a fixed number like top-k, top-p considers the cumulative probability distribution of the ranked tokens. It selects the smallest possible set of tokens whose cumulative probability exceeds a certain threshold (usually denoted as p). The model then samples from this set to choose the next token. This approach ensures that the generated output maintains a balance between diversity and coherence, as it allows for a varying number of tokens to be considered based on their probabilities.
78
+
79
+ Temperature: Temperature is a parameter that controls the randomness of the generated output. It is applied during the sampling process, where a higher temperature value results in more random and diverse outputs, while a lower temperature value leads to more deterministic and focused outputs. In the context of music generation, a higher temperature can introduce more variability and creativity into the generated music, but it may also lead to less coherent or structured compositions. On the other hand, a lower temperature can produce more repetitive and predictable music.
80
+
81
+ Classifier-Free Guidance: Classifier-Free Guidance refers to a technique used in some music generation models where a separate classifier network is trained to provide guidance or control over the generated music. This classifier is trained on labeled data to recognize specific musical characteristics or styles. During the generation process, the output of the generator model is evaluated by the classifier, and the generator is encouraged to produce music that aligns with the desired characteristics or style. This approach allows for more fine-grained control over the generated music, enabling users to specify certain attributes they want the model to capture.
82
+
83
+ These parameters, such as top-k, top-p, temperature, and classifier-free guidance, provide different ways to influence the output of a music generation model and strike a balance between creativity, diversity, coherence, and control. The specific values for these parameters can be tuned based on the desired outcome and user preferences.
84
+
85
+ ## API
86
+
87
+ We provide a simple API and 10 pre-trained models. The pre trained models are:
88
+ - `small`: 300M model, text to music only - [🤗 Hub](https://huggingface.co/facebook/musicgen-small)
89
+ - `medium`: 1.5B model, text to music only - [🤗 Hub](https://huggingface.co/facebook/musicgen-medium)
90
+ - `melody`: 1.5B model, text to music and text+melody to music - [🤗 Hub](https://huggingface.co/facebook/musicgen-melody)
91
+ - `large`: 3.3B model, text to music only - [🤗 Hub](https://huggingface.co/facebook/musicgen-large)
92
+ - `melody large` (3.3B), text to music, and text+melody to music # see: [🤗 Hub](https://huggingface.co/facebook/musicgen-melody-large)
93
+ - `small stereo` (300M), text to music, # see: [🤗 Hub](https://huggingface.co/facebook/musicgen-small)
94
+ - `medium stereo` (1.5B), text to music, # see: [🤗 Hub](https://huggingface.co/facebook/musicgen-stereo-medium)
95
+ - `melody stereo` (1.5B) text to music and text+melody to music, # see: [🤗 Hub](https://huggingface.co/facebook/musicgen-stereo-melody)
96
+ - `large stereo` (3.3B), text to music, # see: [🤗 Hub](https://huggingface.co/facebook/musicgen-stereo-large)
97
+ - `melody large stereo` (3.3B), text to music, and text+melody to music # see: [🤗 Hub](https://huggingface.co/facebook/musicgen-stereo-melody-large)
98
+
99
+ We observe the best trade-off between quality and compute with the `medium` or `melody` model.
100
+ In order to use MusicGen locally **you must have a GPU**. We recommend 16GB of memory, but smaller
101
+ GPUs will be able to generate short sequences, or longer sequences with the `small` model.
102
+
103
+ **Note**: Please make sure to have [ffmpeg](https://ffmpeg.org/download.html) installed when using newer version of `torchaudio`.
104
+ You can install it with:
105
+ ```
106
+ apt-get install ffmpeg
107
+ ```
108
+
109
+ See after a quick example for using the API.
110
+
111
+ ```python
112
+ import torchaudio
113
+ from audiocraft.models import MusicGen
114
+ from audiocraft.data.audio import audio_write
115
+
116
+ model = MusicGen.get_pretrained('melody')
117
+ model.set_generation_params(duration=8) # generate 8 seconds.
118
+ wav = model.generate_unconditional(4) # generates 4 unconditional audio samples
119
+ descriptions = ['happy rock', 'energetic EDM', 'sad jazz']
120
+ wav = model.generate(descriptions) # generates 3 samples.
121
+
122
+ melody, sr = torchaudio.load('./assets/bach.mp3')
123
+ # generates using the melody from the given audio and the provided descriptions.
124
+ wav = model.generate_with_chroma(descriptions, melody[None].expand(3, -1, -1), sr)
125
+
126
+ for idx, one_wav in enumerate(wav):
127
+ # Will save under {idx}.wav, with loudness normalization at -14 db LUFS.
128
+ audio_write(f'{idx}', one_wav.cpu(), model.sample_rate, strategy="loudness", loudness_compressor=True)
129
+ ```
130
+ ## 🤗 Transformers Usage
131
+
132
+ MusicGen is available in the 🤗 Transformers library from version 4.31.0 onwards, requiring minimal dependencies
133
+ and additional packages. Steps to get started:
134
+
135
+ 1. First install the 🤗 [Transformers library](https://github.com/huggingface/transformers) from main:
136
+
137
+ ```
138
+ pip install git+https://github.com/huggingface/transformers.git
139
+ ```
140
+
141
+ 2. Run the following Python code to generate text-conditional audio samples:
142
+
143
+ ```py
144
+ from transformers import AutoProcessor, MusicgenForConditionalGeneration
145
+
146
+
147
+ processor = AutoProcessor.from_pretrained("facebook/musicgen-small")
148
+ model = MusicgenForConditionalGeneration.from_pretrained("facebook/musicgen-small")
149
+
150
+ inputs = processor(
151
+ text=["80s pop track with bassy drums and synth", "90s rock song with loud guitars and heavy drums"],
152
+ padding=True,
153
+ return_tensors="pt",
154
+ )
155
+
156
+ audio_values = model.generate(**inputs, max_new_tokens=256)
157
+ ```
158
+
159
+ 3. Listen to the audio samples either in an ipynb notebook:
160
+
161
+ ```py
162
+ from IPython.display import Audio
163
+
164
+ sampling_rate = model.config.audio_encoder.sampling_rate
165
+ Audio(audio_values[0].numpy(), rate=sampling_rate)
166
+ ```
167
+
168
+ Or save them as a `.wav` file using a third-party library, e.g. `scipy`:
169
+
170
+ ```py
171
+ import scipy
172
+
173
+ sampling_rate = model.config.audio_encoder.sampling_rate
174
+ scipy.io.wavfile.write("musicgen_out.wav", rate=sampling_rate, data=audio_values[0, 0].numpy())
175
+ ```
176
+
177
+ For more details on using the MusicGen model for inference using the 🤗 Transformers library, refer to the
178
+ [MusicGen docs](https://huggingface.co/docs/transformers/main/en/model_doc/musicgen) or the hands-on
179
+ [Google Colab](https://colab.research.google.com/github/sanchit-gandhi/notebooks/blob/main/MusicGen.ipynb).
180
+
181
+
182
+ ## Model Card
183
+
184
+ See [the model card page](./MODEL_CARD.md).
185
+
186
+ ## FAQ
187
+
188
+ #### Will the training code be released?
189
+
190
+ Yes. We will soon release the training code for MusicGen and EnCodec.
191
+
192
+
193
+ #### I need help on Windows
194
+
195
+ @FurkanGozukara made a complete tutorial for [Audiocraft/MusicGen on Windows](https://youtu.be/v-YpvPkhdO4)
196
+
197
+ #### I need help for running the demo on Colab
198
+
199
+ Check [@camenduru tutorial on Youtube](https://www.youtube.com/watch?v=EGfxuTy9Eeo).
200
+
201
+ ## Citation
202
+ ```
203
+ @article{copet2023simple,
204
+ title={Simple and Controllable Music Generation},
205
+ author={Jade Copet and Felix Kreuk and Itai Gat and Tal Remez and David Kant and Gabriel Synnaeve and Yossi Adi and Alexandre Défossez},
206
+ year={2023},
207
+ journal={arXiv preprint arXiv:2306.05284},
208
+ }
209
+ ```
210
+
211
+ ## License
212
+ * The code in this repository is released under the MIT license as found in the [LICENSE file](LICENSE).
213
+ * The weights in this repository are released under the CC-BY-NC 4.0 license as found in the [LICENSE_weights file](LICENSE_weights).
214
+ [arxiv]: https://arxiv.org/abs/2306.05284
215
  [musicgen_samples]: https://ai.honu.io/papers/musicgen/
app.py CHANGED
@@ -167,7 +167,7 @@ def load_melody_filepath(melody_filepath, title):
167
  #$Union[str, os.PathLike]
168
  symbols = ['_', '.', '-']
169
  if (melody_filepath is None) or (melody_filepath == ""):
170
- return title, gr.update(maximum=0, value=0) , gr.update(value="melody", interactive=True)
171
 
172
  if (title is None) or ("MusicGen" in title) or (title == ""):
173
  melody_name, melody_extension = get_filename_from_filepath(melody_filepath)
@@ -187,7 +187,7 @@ def load_melody_filepath(melody_filepath, title):
187
  print(f"Melody length: {len(melody_data)}, Melody segments: {total_melodys}\n")
188
  MAX_PROMPT_INDEX = total_melodys
189
 
190
- return gr.Textbox.update(value=melody_name), gr.update(maximum=MAX_PROMPT_INDEX, value=0), gr.update(value="melody", interactive=False)
191
 
192
  def predict(model, text, melody_filepath, duration, dimension, topk, topp, temperature, cfg_coef, background, title, settings_font, settings_font_color, seed, overlap=1, prompt_index = 0, include_title = True, include_settings = True, harmony_only = False):
193
  global MODEL, INTERRUPTED, INTERRUPTING, MOVE_TO_CPU
@@ -358,7 +358,7 @@ def ui(**kwargs):
358
 
359
  Disclaimer: This won't run on CPU only. Clone this App and run on GPU instance!
360
 
361
- Todo: Working on improved transitions between 30 second segments, improve Interrupt.
362
  """
363
  )
364
  if IS_SHARED_SPACE and not torch.cuda.is_available():
@@ -375,7 +375,7 @@ def ui(**kwargs):
375
  text = gr.Text(label="Describe your music", interactive=True, value="4/4 100bpm 320kbps 48khz, Industrial/Electronic Soundtrack, Dark, Intense, Sci-Fi")
376
  with gr.Column():
377
  duration = gr.Slider(minimum=1, maximum=720, value=10, label="Duration (s)", interactive=True)
378
- model = gr.Radio(["melody", "medium", "small", "large"], label="AI Model", value="melody", interactive=True)
379
  with gr.Row():
380
  submit = gr.Button("Generate", elem_id="btn-generate")
381
  # Adapted from https://github.com/rkfg/audiocraft/blob/long/app.py, MIT license.
 
167
  #$Union[str, os.PathLike]
168
  symbols = ['_', '.', '-']
169
  if (melody_filepath is None) or (melody_filepath == ""):
170
+ return title, gr.update(maximum=0, value=0) , gr.update(value="melody-large", interactive=True)
171
 
172
  if (title is None) or ("MusicGen" in title) or (title == ""):
173
  melody_name, melody_extension = get_filename_from_filepath(melody_filepath)
 
187
  print(f"Melody length: {len(melody_data)}, Melody segments: {total_melodys}\n")
188
  MAX_PROMPT_INDEX = total_melodys
189
 
190
+ return gr.Textbox.update(value=melody_name), gr.update(maximum=MAX_PROMPT_INDEX, value=0), gr.update(value="melody-large", interactive=True)
191
 
192
  def predict(model, text, melody_filepath, duration, dimension, topk, topp, temperature, cfg_coef, background, title, settings_font, settings_font_color, seed, overlap=1, prompt_index = 0, include_title = True, include_settings = True, harmony_only = False):
193
  global MODEL, INTERRUPTED, INTERRUPTING, MOVE_TO_CPU
 
358
 
359
  Disclaimer: This won't run on CPU only. Clone this App and run on GPU instance!
360
 
361
+ Todo: Working on improved Interrupt and new Models.
362
  """
363
  )
364
  if IS_SHARED_SPACE and not torch.cuda.is_available():
 
375
  text = gr.Text(label="Describe your music", interactive=True, value="4/4 100bpm 320kbps 48khz, Industrial/Electronic Soundtrack, Dark, Intense, Sci-Fi")
376
  with gr.Column():
377
  duration = gr.Slider(minimum=1, maximum=720, value=10, label="Duration (s)", interactive=True)
378
+ model = gr.Radio(["melody", "medium", "small", "large", "melody-large", "stereo-melody", "stereo-medium", "stereo-small", "stereo-large", "stereo-melody-large"], label="AI Model", value="melody-large", interactive=True)
379
  with gr.Row():
380
  submit = gr.Button("Generate", elem_id="btn-generate")
381
  # Adapted from https://github.com/rkfg/audiocraft/blob/long/app.py, MIT license.
audiocraft/models/loaders.py CHANGED
@@ -35,6 +35,12 @@ HF_MODEL_CHECKPOINTS_MAP = {
35
  "medium": "facebook/musicgen-medium",
36
  "large": "facebook/musicgen-large",
37
  "melody": "facebook/musicgen-melody",
 
 
 
 
 
 
38
  }
39
 
40
 
 
35
  "medium": "facebook/musicgen-medium",
36
  "large": "facebook/musicgen-large",
37
  "melody": "facebook/musicgen-melody",
38
+ "melody-large": "facebook/musicgen-melody-large",
39
+ "stereo-small": "facebook/musicgen-stereo-small",
40
+ "stereo-medium": "facebook/musicgen-stereo-medium",
41
+ "stereo-large": "facebook/musicgen-stereo-large",
42
+ "stereo-melody": "facebook/musicgen-stereo-melody",
43
+ "stereo-melody-large": "facebook/musicgen-stereo-melody-large",
44
  }
45
 
46
 
audiocraft/models/musicgen.py CHANGED
@@ -68,12 +68,18 @@ class MusicGen:
68
  return self.compression_model.channels
69
 
70
  @staticmethod
71
- def get_pretrained(name: str = 'melody', device=None):
72
- """Return pretrained model, we provide four models:
73
  - small (300M), text to music, # see: https://huggingface.co/facebook/musicgen-small
74
  - medium (1.5B), text to music, # see: https://huggingface.co/facebook/musicgen-medium
75
  - melody (1.5B) text to music and text+melody to music, # see: https://huggingface.co/facebook/musicgen-melody
76
  - large (3.3B), text to music, # see: https://huggingface.co/facebook/musicgen-large
 
 
 
 
 
 
77
  """
78
 
79
  if device is None:
 
68
  return self.compression_model.channels
69
 
70
  @staticmethod
71
+ def get_pretrained(name: str = 'melody-large', device=None):
72
+ """Return pretrained model, we provide ten models:
73
  - small (300M), text to music, # see: https://huggingface.co/facebook/musicgen-small
74
  - medium (1.5B), text to music, # see: https://huggingface.co/facebook/musicgen-medium
75
  - melody (1.5B) text to music and text+melody to music, # see: https://huggingface.co/facebook/musicgen-melody
76
  - large (3.3B), text to music, # see: https://huggingface.co/facebook/musicgen-large
77
+ - melody-large (3.3B), text to music, and text+melody to music # see: https://huggingface.co/facebook/musicgen-melody-large
78
+ - stereo-small (300M), text to music, # see: https://huggingface.co/facebook/musicgen-small
79
+ - stereo-medium (1.5B), text to music, # see: https://huggingface.co/facebook/musicgen-stereo-medium
80
+ - stereo-melody (1.5B) text to music and text+melody to music, # see: https://huggingface.co/facebook/musicgen-stereo-melody
81
+ - stereo-large (3.3B), text to music, # see: https://huggingface.co/facebook/musicgen-stereo-large
82
+ - stereo-melody-large (3.3B), text to music, and text+melody to music # see: https://huggingface.co/facebook/musicgen-stereo-melody-large
83
  """
84
 
85
  if device is None: