Surn commited on
Commit
790c9e0
·
1 Parent(s): 8817130

Minor Updates for main project consistency

Browse files

update colab

requirements update

reverse HF gradio version

update gradio req

Files changed (5) hide show
  1. README.md +63 -21
  2. app.py +27 -2
  3. audiocraft/data/audio_utils.py +29 -1
  4. pre-requirements.txt +2 -0
  5. requirements.txt +1 -0
README.md CHANGED
@@ -4,7 +4,7 @@ emoji: 🎼
4
  colorFrom: white
5
  colorTo: red
6
  sdk: gradio
7
- sdk_version: 3.34.0
8
  app_file: app.py
9
  pinned: false
10
  license: creativeml-openrail-m
@@ -60,26 +60,13 @@ pip install -e . # or if you cloned the repo locally
60
 
61
  ## Usage
62
  We offer a number of way to interact with MusicGen:
63
- 1. You can play with MusicGen by running the jupyter notebook at [`demo.ipynb`](./demo.ipynb) locally, or use the provided [colab notebook](https://colab.research.google.com/drive/1fxGqfg96RBUvGxZ1XXN07s3DthrKUl4-?usp=sharing).
64
- 2. You can use the gradio demo locally by running `python app.py`.
65
- 3. A demo is also available on the [`facebook/MusicGen` HuggingFace Space](https://huggingface.co/spaces/facebook/MusicGen) (huge thanks to all the HF team for their support).
66
- 4. Finally, you can run the [Gradio demo with a Colab GPU](https://colab.research.google.com/drive/1-Xe9NCdIs2sCUbiSmwHXozK6AAhMm7_i?usp=sharing),
67
- as adapted from [@camenduru Colab](https://github.com/camenduru/MusicGen-colab).
68
-
69
- ### More info about Top-k, Top-p, Temperature and Classifier Free Guidance from ChatGPT
70
-
71
-
72
- Top-k: Top-k is a parameter used in text generation models, including music generation models. It determines the number of most likely next tokens to consider at each step of the generation process. The model ranks all possible tokens based on their predicted probabilities, and then selects the top-k tokens from the ranked list. The model then samples from this reduced set of tokens to determine the next token in the generated sequence. A smaller value of k results in a more focused and deterministic output, while a larger value of k allows for more diversity in the generated music.
73
-
74
- Top-p (or nucleus sampling): Top-p, also known as nucleus sampling or probabilistic sampling, is another method used for token selection during text generation. Instead of specifying a fixed number like top-k, top-p considers the cumulative probability distribution of the ranked tokens. It selects the smallest possible set of tokens whose cumulative probability exceeds a certain threshold (usually denoted as p). The model then samples from this set to choose the next token. This approach ensures that the generated output maintains a balance between diversity and coherence, as it allows for a varying number of tokens to be considered based on their probabilities.
75
-
76
- Temperature: Temperature is a parameter that controls the randomness of the generated output. It is applied during the sampling process, where a higher temperature value results in more random and diverse outputs, while a lower temperature value leads to more deterministic and focused outputs. In the context of music generation, a higher temperature can introduce more variability and creativity into the generated music, but it may also lead to less coherent or structured compositions. On the other hand, a lower temperature can produce more repetitive and predictable music.
77
-
78
- Classifier-Free Guidance: Classifier-Free Guidance refers to a technique used in some music generation models where a separate classifier network is trained to provide guidance or control over the generated music. This classifier is trained on labeled data to recognize specific musical characteristics or styles. During the generation process, the output of the generator model is evaluated by the classifier, and the generator is encouraged to produce music that aligns with the desired characteristics or style. This approach allows for more fine-grained control over the generated music, enabling users to specify certain attributes they want the model to capture.
79
-
80
- These parameters, such as top-k, top-p, temperature, and classifier-free guidance, provide different ways to influence the output of a music generation model and strike a balance between creativity, diversity, coherence, and control. The specific values for these parameters can be tuned based on the desired outcome and user preferences.
81
-
82
-
83
 
84
  ## API
85
 
@@ -120,6 +107,56 @@ for idx, one_wav in enumerate(wav):
120
  # Will save under {idx}.wav, with loudness normalization at -14 db LUFS.
121
  audio_write(f'{idx}', one_wav.cpu(), model.sample_rate, strategy="loudness", loudness_compressor=True)
122
  ```
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
123
 
124
 
125
  ## Model Card
@@ -137,6 +174,9 @@ Yes. We will soon release the training code for MusicGen and EnCodec.
137
 
138
  @FurkanGozukara made a complete tutorial for [Audiocraft/MusicGen on Windows](https://youtu.be/v-YpvPkhdO4)
139
 
 
 
 
140
 
141
  ## Citation
142
  ```
@@ -151,3 +191,5 @@ Yes. We will soon release the training code for MusicGen and EnCodec.
151
  ## License
152
  * The code in this repository is released under the MIT license as found in the [LICENSE file](LICENSE).
153
  * The weights in this repository are released under the CC-BY-NC 4.0 license as found in the [LICENSE_weights file](LICENSE_weights).
 
 
 
4
  colorFrom: white
5
  colorTo: red
6
  sdk: gradio
7
+ sdk_version: 3.33.1
8
  app_file: app.py
9
  pinned: false
10
  license: creativeml-openrail-m
 
60
 
61
  ## Usage
62
  We offer a number of way to interact with MusicGen:
63
+ 1. A demo is also available on the [`facebook/MusicGen` HuggingFace Space](https://huggingface.co/spaces/facebook/MusicGen) (huge thanks to all the HF team for their support).
64
+ 2. You can run the Gradio demo in Colab: [colab notebook](https://colab.research.google.com/drive/1-Xe9NCdIs2sCUbiSmwHXozK6AAhMm7_i?usp=sharing).
65
+ 3. You can use the gradio demo locally by running `python app.py`.
66
+ 4. You can play with MusicGen by running the jupyter notebook at [`demo.ipynb`](./demo.ipynb) locally (if you have a GPU).
67
+ 5. Checkout [@camenduru Colab page](https://github.com/camenduru/MusicGen-colab) which is regularly
68
+ updated with contributions from @camenduru and the community.
69
+ 6. Finally, MusicGen is available in 🤗 Transformers from v4.31.0 onwards, see section [🤗 Transformers Usage](#-transformers-usage) below.
 
 
 
 
 
 
 
 
 
 
 
 
 
70
 
71
  ## API
72
 
 
107
  # Will save under {idx}.wav, with loudness normalization at -14 db LUFS.
108
  audio_write(f'{idx}', one_wav.cpu(), model.sample_rate, strategy="loudness", loudness_compressor=True)
109
  ```
110
+ ## 🤗 Transformers Usage
111
+
112
+ MusicGen is available in the 🤗 Transformers library from version 4.31.0 onwards, requiring minimal dependencies
113
+ and additional packages. Steps to get started:
114
+
115
+ 1. First install the 🤗 [Transformers library](https://github.com/huggingface/transformers) from main:
116
+
117
+ ```
118
+ pip install git+https://github.com/huggingface/transformers.git
119
+ ```
120
+
121
+ 2. Run the following Python code to generate text-conditional audio samples:
122
+
123
+ ```py
124
+ from transformers import AutoProcessor, MusicgenForConditionalGeneration
125
+
126
+
127
+ processor = AutoProcessor.from_pretrained("facebook/musicgen-small")
128
+ model = MusicgenForConditionalGeneration.from_pretrained("facebook/musicgen-small")
129
+
130
+ inputs = processor(
131
+ text=["80s pop track with bassy drums and synth", "90s rock song with loud guitars and heavy drums"],
132
+ padding=True,
133
+ return_tensors="pt",
134
+ )
135
+
136
+ audio_values = model.generate(**inputs, max_new_tokens=256)
137
+ ```
138
+
139
+ 3. Listen to the audio samples either in an ipynb notebook:
140
+
141
+ ```py
142
+ from IPython.display import Audio
143
+
144
+ sampling_rate = model.config.audio_encoder.sampling_rate
145
+ Audio(audio_values[0].numpy(), rate=sampling_rate)
146
+ ```
147
+
148
+ Or save them as a `.wav` file using a third-party library, e.g. `scipy`:
149
+
150
+ ```py
151
+ import scipy
152
+
153
+ sampling_rate = model.config.audio_encoder.sampling_rate
154
+ scipy.io.wavfile.write("musicgen_out.wav", rate=sampling_rate, data=audio_values[0, 0].numpy())
155
+ ```
156
+
157
+ For more details on using the MusicGen model for inference using the 🤗 Transformers library, refer to the
158
+ [MusicGen docs](https://huggingface.co/docs/transformers/main/en/model_doc/musicgen) or the hands-on
159
+ [Google Colab](https://colab.research.google.com/github/sanchit-gandhi/notebooks/blob/main/MusicGen.ipynb).
160
 
161
 
162
  ## Model Card
 
174
 
175
  @FurkanGozukara made a complete tutorial for [Audiocraft/MusicGen on Windows](https://youtu.be/v-YpvPkhdO4)
176
 
177
+ #### I need help for running the demo on Colab
178
+
179
+ Check [@camenduru tutorial on Youtube](https://www.youtube.com/watch?v=EGfxuTy9Eeo).
180
 
181
  ## Citation
182
  ```
 
191
  ## License
192
  * The code in this repository is released under the MIT license as found in the [LICENSE file](LICENSE).
193
  * The weights in this repository are released under the CC-BY-NC 4.0 license as found in the [LICENSE_weights file](LICENSE_weights).
194
+ [arxiv]: https://arxiv.org/abs/2306.05284
195
+ [musicgen_samples]: https://ai.honu.io/papers/musicgen/
app.py CHANGED
@@ -11,11 +11,13 @@ import argparse
11
  import torch
12
  import gradio as gr
13
  import os
 
14
  import time
 
15
  import warnings
16
  from audiocraft.models import MusicGen
17
  from audiocraft.data.audio import audio_write
18
- from audiocraft.data.audio_utils import apply_fade, apply_tafade
19
  from audiocraft.utils.extend import generate_music_segments, add_settings_to_image, INTERRUPTING
20
  import numpy as np
21
  import random
@@ -38,6 +40,28 @@ def interrupt():
38
  global INTERRUPTING
39
  INTERRUPTING = True
40
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
41
  def toggle_audio_src(choice):
42
  if choice == "mic":
43
  return gr.update(source="microphone", value=None, label="Microphone")
@@ -235,8 +259,9 @@ def predict(model, text, melody_filepath, duration, dimension, topk, topp, tempe
235
  overlapping_output_fadein = output_segments[i][:, :, :overlap_samples]
236
  #overlapping_output_fadein = apply_fade(overlapping_output_fadein,sample_rate=MODEL.sample_rate,duration=overlap,out=False,start=False, curve_start=0.0, current_device=MODEL.device)
237
  overlapping_output_fadein = apply_tafade(overlapping_output_fadein,sample_rate=MODEL.sample_rate,duration=overlap,out=False,start=False, shape="linear")
238
-
239
  overlapping_output = torch.cat([overlapping_output_fadeout[:, :, :-(overlap_samples // 2)], overlapping_output_fadein],dim=2)
 
240
  print(f" overlap size Fade:{overlapping_output.size()}\n output: {output.size()}\n segment: {output_segments[i].size()}")
241
  ##overlapping_output = torch.cat([output[:, :, -overlap_samples:], output_segments[i][:, :, :overlap_samples]], dim=1) #stack tracks
242
  ##print(f" overlap size stack:{overlapping_output.size()}\n output: {output.size()}\n segment: {output_segments[i].size()}")
 
11
  import torch
12
  import gradio as gr
13
  import os
14
+ from pathlib import Path
15
  import time
16
+ import typing as tp
17
  import warnings
18
  from audiocraft.models import MusicGen
19
  from audiocraft.data.audio import audio_write
20
+ from audiocraft.data.audio_utils import apply_fade, apply_tafade, apply_splice_effect
21
  from audiocraft.utils.extend import generate_music_segments, add_settings_to_image, INTERRUPTING
22
  import numpy as np
23
  import random
 
40
  global INTERRUPTING
41
  INTERRUPTING = True
42
 
43
+ class FileCleaner:
44
+ def __init__(self, file_lifetime: float = 3600):
45
+ self.file_lifetime = file_lifetime
46
+ self.files = []
47
+
48
+ def add(self, path: tp.Union[str, Path]):
49
+ self._cleanup()
50
+ self.files.append((time.time(), Path(path)))
51
+
52
+ def _cleanup(self):
53
+ now = time.time()
54
+ for time_added, path in list(self.files):
55
+ if now - time_added > self.file_lifetime:
56
+ if path.exists():
57
+ path.unlink()
58
+ self.files.pop(0)
59
+ else:
60
+ break
61
+
62
+
63
+ #file_cleaner = FileCleaner()
64
+
65
  def toggle_audio_src(choice):
66
  if choice == "mic":
67
  return gr.update(source="microphone", value=None, label="Microphone")
 
259
  overlapping_output_fadein = output_segments[i][:, :, :overlap_samples]
260
  #overlapping_output_fadein = apply_fade(overlapping_output_fadein,sample_rate=MODEL.sample_rate,duration=overlap,out=False,start=False, curve_start=0.0, current_device=MODEL.device)
261
  overlapping_output_fadein = apply_tafade(overlapping_output_fadein,sample_rate=MODEL.sample_rate,duration=overlap,out=False,start=False, shape="linear")
262
+
263
  overlapping_output = torch.cat([overlapping_output_fadeout[:, :, :-(overlap_samples // 2)], overlapping_output_fadein],dim=2)
264
+ ###overlapping_output, overlap_sample_rate = apply_splice_effect(overlapping_output_fadeout, MODEL.sample_rate, overlapping_output_fadein, MODEL.sample_rate, overlap)
265
  print(f" overlap size Fade:{overlapping_output.size()}\n output: {output.size()}\n segment: {output_segments[i].size()}")
266
  ##overlapping_output = torch.cat([output[:, :, -overlap_samples:], output_segments[i][:, :, :overlap_samples]], dim=1) #stack tracks
267
  ##print(f" overlap size stack:{overlapping_output.size()}\n output: {output.size()}\n segment: {output_segments[i].size()}")
audiocraft/data/audio_utils.py CHANGED
@@ -262,4 +262,32 @@ def apply_fade(audio: torch.Tensor, sample_rate, duration=3.0, out=True, start=T
262
 
263
  wav = normalize_loudness(audio_faded,sample_rate, loudness_headroom_db=18, loudness_compressor=True)
264
  _clip_wav(wav, log_clipping=False, stem_name=stem_name)
265
- return wav
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
262
 
263
  wav = normalize_loudness(audio_faded,sample_rate, loudness_headroom_db=18, loudness_compressor=True)
264
  _clip_wav(wav, log_clipping=False, stem_name=stem_name)
265
+ return wav
266
+
267
+ def apply_splice_effect(waveform1, sample_rate1, waveform2, sample_rate2, overlap):
268
+ # Convert sample rates to integers
269
+ sample_rate1 = int(sample_rate1)
270
+ sample_rate2 = int(sample_rate2)
271
+
272
+ # Convert tensors to mono-channel if needed
273
+ if waveform1.ndim > 2:
274
+ waveform1 = waveform1.mean(dim=1)
275
+ if waveform2.ndim > 2:
276
+ waveform2 = waveform2.mean(dim=1)
277
+
278
+ ## Convert tensors to numpy arrays
279
+ #waveform1_np = waveform1.numpy()
280
+ #waveform2_np = waveform2.numpy()
281
+
282
+ # Apply splice effect using torchaudio.sox_effects.apply_effects_tensor
283
+ effects = [
284
+ ["splice", f"-q {waveform1},{overlap}"],
285
+ ]
286
+ output_waveform, output_sample_rate = torchaudio.sox_effects.apply_effects_tensor(
287
+ torch.cat([waveform1.unsqueeze(0), waveform2.unsqueeze(0)], dim=2),
288
+ sample_rate1,
289
+ effects
290
+ )
291
+
292
+ return output_waveform.squeeze(0), output_sample_rate
293
+
pre-requirements.txt ADDED
@@ -0,0 +1,2 @@
 
 
 
1
+ pip>=23.2
2
+ gradio_client==0.2.7
requirements.txt CHANGED
@@ -11,6 +11,7 @@ sentencepiece
11
  spacy==3.5.2
12
  torch>=2.0.0
13
  torchaudio>=2.0.0
 
14
  huggingface_hub
15
  tqdm
16
  transformers
 
11
  spacy==3.5.2
12
  torch>=2.0.0
13
  torchaudio>=2.0.0
14
+ soundfile
15
  huggingface_hub
16
  tqdm
17
  transformers