thelou1s mjaramillo commited on
Commit
7834510
0 Parent(s):

Duplicate from mjaramillo/SpiceIcaroTP

Browse files

Co-authored-by: Miguel Jaramillo <mjaramillo@users.noreply.huggingface.co>

Files changed (7) hide show
  1. .gitattributes +27 -0
  2. .github/workflows/main.yml +19 -0
  3. LICENSE +21 -0
  4. README.md +14 -0
  5. TP3.ipynb +0 -0
  6. app_deploy.py +506 -0
  7. requirements.txt +10 -0
.gitattributes ADDED
@@ -0,0 +1,27 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ *.7z filter=lfs diff=lfs merge=lfs -text
2
+ *.arrow filter=lfs diff=lfs merge=lfs -text
3
+ *.bin filter=lfs diff=lfs merge=lfs -text
4
+ *.bz2 filter=lfs diff=lfs merge=lfs -text
5
+ *.ftz filter=lfs diff=lfs merge=lfs -text
6
+ *.gz filter=lfs diff=lfs merge=lfs -text
7
+ *.h5 filter=lfs diff=lfs merge=lfs -text
8
+ *.joblib filter=lfs diff=lfs merge=lfs -text
9
+ *.lfs.* filter=lfs diff=lfs merge=lfs -text
10
+ *.model filter=lfs diff=lfs merge=lfs -text
11
+ *.msgpack filter=lfs diff=lfs merge=lfs -text
12
+ *.onnx filter=lfs diff=lfs merge=lfs -text
13
+ *.ot filter=lfs diff=lfs merge=lfs -text
14
+ *.parquet filter=lfs diff=lfs merge=lfs -text
15
+ *.pb filter=lfs diff=lfs merge=lfs -text
16
+ *.pt filter=lfs diff=lfs merge=lfs -text
17
+ *.pth filter=lfs diff=lfs merge=lfs -text
18
+ *.rar filter=lfs diff=lfs merge=lfs -text
19
+ saved_model/**/* filter=lfs diff=lfs merge=lfs -text
20
+ *.tar.* filter=lfs diff=lfs merge=lfs -text
21
+ *.tflite filter=lfs diff=lfs merge=lfs -text
22
+ *.tgz filter=lfs diff=lfs merge=lfs -text
23
+ *.wasm filter=lfs diff=lfs merge=lfs -text
24
+ *.xz filter=lfs diff=lfs merge=lfs -text
25
+ *.zip filter=lfs diff=lfs merge=lfs -text
26
+ *.zstandard filter=lfs diff=lfs merge=lfs -text
27
+ *tfevents* filter=lfs diff=lfs merge=lfs -text
.github/workflows/main.yml ADDED
@@ -0,0 +1,19 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ name: Sync to Hugging Face hub
2
+ on:
3
+ push:
4
+ branches: [main]
5
+
6
+ # to run this workflow manually from the Actions tab
7
+ workflow_dispatch:
8
+
9
+ jobs:
10
+ sync-to-hub:
11
+ runs-on: ubuntu-latest
12
+ steps:
13
+ - uses: actions/checkout@v2
14
+ with:
15
+ fetch-depth: 0
16
+ - name: Push to hub
17
+ env:
18
+ HF_TOKEN: ${{ secrets.HF_TOKEN }}
19
+ run: git push --force https://mjaramillo:$HF_TOKEN@huggingface.co/spaces/mjaramillo/SpiceIcaroTP
LICENSE ADDED
@@ -0,0 +1,21 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ MIT License
2
+
3
+ Copyright (c) 2022 Miguel Jaramillo
4
+
5
+ Permission is hereby granted, free of charge, to any person obtaining a copy
6
+ of this software and associated documentation files (the "Software"), to deal
7
+ in the Software without restriction, including without limitation the rights
8
+ to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
9
+ copies of the Software, and to permit persons to whom the Software is
10
+ furnished to do so, subject to the following conditions:
11
+
12
+ The above copyright notice and this permission notice shall be included in all
13
+ copies or substantial portions of the Software.
14
+
15
+ THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
16
+ IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
17
+ FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
18
+ AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
19
+ LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
20
+ OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
21
+ SOFTWARE.
README.md ADDED
@@ -0,0 +1,14 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ title: SpiceIcaroTP
3
+ emoji: 🌍
4
+ colorFrom: pink
5
+ colorTo: gray
6
+ sdk: gradio
7
+ sdk_version: 2.9.4
8
+ app_file: app_deploy.py
9
+ pinned: false
10
+ license: mit
11
+ duplicated_from: mjaramillo/SpiceIcaroTP
12
+ ---
13
+
14
+ Check out the configuration reference at https://huggingface.co/docs/hub/spaces#reference
TP3.ipynb ADDED
The diff for this file is too large to render. See raw diff
 
app_deploy.py ADDED
@@ -0,0 +1,506 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # -*- coding: utf-8 -*-
2
+ """tp3__1_-1.ipynb
3
+
4
+ Automatically generated by Colaboratory.
5
+
6
+ Original file is located at
7
+ https://colab.research.google.com/drive/1_Sjx5G1BW689ggZJAJ4P7kCZndOobNCp
8
+ """
9
+
10
+ # Install Gradio
11
+ #!pip install gradio -q
12
+
13
+ # Install timidy
14
+ #!sudo apt-get install -q -y timidity libsndfile1
15
+
16
+ # All the imports to deal with sound data
17
+ #!pip install pydub numba==0.48 librosa music21
18
+
19
+ # Import Libraries
20
+
21
+ import gradio as gr
22
+ import time
23
+
24
+ import tensorflow as tf
25
+ import tensorflow_hub as hub
26
+
27
+ import numpy as np
28
+ import matplotlib.pyplot as plt
29
+ import librosa
30
+ from librosa import display as librosadisplay
31
+
32
+ import logging
33
+ import math
34
+ import statistics
35
+ import sys
36
+
37
+ from IPython.display import Audio, Javascript
38
+ from scipy.io import wavfile
39
+
40
+ from base64 import b64decode
41
+
42
+ import music21
43
+ from pydub import AudioSegment
44
+
45
+ logger = logging.getLogger()
46
+ logger.setLevel(logging.ERROR)
47
+
48
+ #print("tensorflow: %s" % tf.__version__)
49
+ #print("librosa: %s" % librosa.__version__)
50
+
51
+ # The audio input file
52
+ # Now the hardest part: Record your singing! :)
53
+
54
+ # We provide four methods to obtain an audio file:
55
+
56
+ # 1. Record audio directly in Gradio
57
+ # 2. Use a file saved on Google Drive
58
+
59
+ # Use a file saved on Google Drive
60
+ #INPUT_SOURCE = 'https://storage.googleapis.com/download.tensorflow.org/data/c-scale-metronome.wav'
61
+
62
+ #!wget --no-check-certificate 'https://storage.googleapis.com/download.tensorflow.org/data/c-scale-metronome.wav' -O c-scale.wav
63
+
64
+ #uploaded_file_name = 'c-scale.wav'
65
+
66
+ #uploaded_file_name
67
+
68
+ # Function that converts the user-created audio to the format that the model
69
+ # expects: bitrate 16kHz and only one channel (mono).
70
+
71
+ EXPECTED_SAMPLE_RATE = 16000
72
+
73
+ # Funciones #
74
+ def convert_audio_for_model(user_file, output_file='converted_audio_file.wav'):
75
+ audio = AudioSegment.from_file(user_file)
76
+ audio = audio.set_frame_rate(EXPECTED_SAMPLE_RATE).set_channels(1)
77
+ audio.export(output_file, format="wav")
78
+ return output_file
79
+
80
+ MAX_ABS_INT16 = 32768.0
81
+
82
+ def plot_stft(x, sample_rate, show_black_and_white=False):
83
+ x_stft = np.abs(librosa.stft(x, n_fft=2048))
84
+ fig, ax = plt.subplots()
85
+ fig.set_size_inches(20, 10)
86
+ x_stft_db = librosa.amplitude_to_db(x_stft, ref=np.max)
87
+
88
+ if(show_black_and_white):
89
+ librosadisplay.specshow(data=x_stft_db,
90
+ y_axis='log',
91
+ sr=sample_rate,
92
+ cmap='gray_r')
93
+ else:
94
+ librosadisplay.specshow(data=x_stft_db,
95
+ y_axis='log',
96
+ sr=sample_rate)
97
+
98
+ plt.colorbar(format='%+2.0f dB')
99
+
100
+ return fig
101
+
102
+ # Loading audio samples from the wav file:
103
+ #sample_rate, audio_samples = wavfile.read(converted_audio_file, 'rb')
104
+
105
+ #fig = plot_stft(audio_samples / MAX_ABS_INT16 , sample_rate=EXPECTED_SAMPLE_RATE)
106
+
107
+ # Executing the Model
108
+ # Loading the SPICE model is easy:
109
+ model = hub.load("https://tfhub.dev/google/spice/2")
110
+
111
+ def plot_pitch_conf(pitch_outputs,confidence_outputs):
112
+ fig, ax = plt.subplots()
113
+ fig.set_size_inches(20, 10)
114
+ plt.plot(pitch_outputs, label='pitch')
115
+ plt.plot(confidence_outputs, label='confidence')
116
+ plt.legend(loc="lower right")
117
+ return fig
118
+
119
+ def plot_pitch_conf_notes(confident_pitch_outputs_x,confident_pitch_outputs_y):
120
+ fig, ax = plt.subplots()
121
+ fig.set_size_inches(20, 10)
122
+ ax.set_ylim([0, 1])
123
+ plt.scatter(confident_pitch_outputs_x, confident_pitch_outputs_y, )
124
+ plt.scatter(confident_pitch_outputs_x, confident_pitch_outputs_y, c="r")
125
+ return fig
126
+
127
+ def output2hz(pitch_output):
128
+ # Constants taken from https://tfhub.dev/google/spice/2
129
+ PT_OFFSET = 25.58
130
+ PT_SLOPE = 63.07
131
+ FMIN = 10.0;
132
+ BINS_PER_OCTAVE = 12.0;
133
+ cqt_bin = pitch_output * PT_SLOPE + PT_OFFSET;
134
+ return FMIN * 2.0 ** (1.0 * cqt_bin / BINS_PER_OCTAVE)
135
+
136
+ def espectro_notas(audio_samples,EXPECTED_SAMPLE_RATE,confident_pitch_outputs_x,confident_pitch_values_hz):
137
+ fig, ax = plt.subplots()
138
+ plot_stft(audio_samples / MAX_ABS_INT16 ,
139
+ sample_rate=EXPECTED_SAMPLE_RATE, show_black_and_white=True)
140
+ # Note: conveniently, since the plot is in log scale, the pitch outputs
141
+ # also get converted to the log scale automatically by matplotlib.
142
+ plt.scatter(confident_pitch_outputs_x, confident_pitch_values_hz, c="r")
143
+ return fig
144
+
145
+ def hz2offset(freq):
146
+ # This measures the quantization error for a single note.
147
+ if freq == 0: # Rests always have zero error.
148
+ return None
149
+ # Quantized note.
150
+ h = round(12 * math.log2(freq / C0))
151
+ return 12 * math.log2(freq / C0) - h
152
+
153
+ def quantize_predictions(group, ideal_offset):
154
+ # Group values are either 0, or a pitch in Hz.
155
+ non_zero_values = [v for v in group if v != 0]
156
+ zero_values_count = len(group) - len(non_zero_values)
157
+
158
+ # Create a rest if 80% is silent, otherwise create a note.
159
+ if zero_values_count > 0.8 * len(group):
160
+ # Interpret as a rest. Count each dropped note as an error, weighted a bit
161
+ # worse than a badly sung note (which would 'cost' 0.5).
162
+ return 0.51 * len(non_zero_values), "Rest"
163
+ else:
164
+ # Interpret as note, estimating as mean of non-rest predictions.
165
+ h = round(
166
+ statistics.mean([
167
+ 12 * math.log2(freq / C0) - ideal_offset for freq in non_zero_values
168
+ ]))
169
+ octave = h // 12
170
+ n = h % 12
171
+ note = note_names[n] + str(octave)
172
+ # Quantization error is the total difference from the quantized note.
173
+ error = sum([
174
+ abs(12 * math.log2(freq / C0) - ideal_offset - h)
175
+ for freq in non_zero_values
176
+ ])
177
+ return error, note
178
+
179
+ def get_quantization_and_error(pitch_outputs_and_rests, predictions_per_eighth,
180
+ prediction_start_offset, ideal_offset):
181
+ # Apply the start offset - we can just add the offset as rests.
182
+ pitch_outputs_and_rests = [0] * prediction_start_offset + \
183
+ pitch_outputs_and_rests
184
+ # Collect the predictions for each note (or rest).
185
+ groups = [
186
+ pitch_outputs_and_rests[i:i + predictions_per_eighth]
187
+ for i in range(0, len(pitch_outputs_and_rests), predictions_per_eighth)
188
+ ]
189
+
190
+ quantization_error = 0
191
+
192
+ notes_and_rests = []
193
+ for group in groups:
194
+ error, note_or_rest = quantize_predictions(group, ideal_offset)
195
+ quantization_error += error
196
+ notes_and_rests.append(note_or_rest)
197
+
198
+ return quantization_error, notes_and_rests
199
+
200
+ def main(audio):
201
+
202
+ # Preparing the audio data
203
+ # Now we have the audio, let's convert it to the expected format and then
204
+ # listen to it!
205
+ # The SPICE model needs as input an audio file at a sampling rate of 16kHz and
206
+ # with only one channel (mono).
207
+ # To help you with this part, we created a function(`convert_audio_for_model`)
208
+ #to convert any wav file you have to the model's expected format:
209
+
210
+
211
+ # Converting to the expected format for the model
212
+ # in all the input 4 input method before, the uploaded file name is at
213
+ # the variable uploaded_file_name
214
+ converted_audio_file = convert_audio_for_model(audio)
215
+
216
+ # Loading audio samples from the wav file:
217
+ sample_rate, audio_samples = wavfile.read(converted_audio_file, 'rb')
218
+
219
+ audio_samples = audio_samples / float(MAX_ABS_INT16)
220
+
221
+
222
+ # We now feed the audio to the SPICE tf.hub model to obtain pitch and uncertainty outputs as tensors.
223
+ model_output = model.signatures["serving_default"](tf.constant(audio_samples, tf.float32))
224
+
225
+ pitch_outputs = model_output["pitch"]
226
+ uncertainty_outputs = model_output["uncertainty"]
227
+
228
+ # 'Uncertainty' basically means the inverse of confidence.
229
+ confidence_outputs = 1.0 - uncertainty_outputs
230
+
231
+
232
+ confidence_outputs = list(confidence_outputs)
233
+ pitch_outputs = [ float(x) for x in pitch_outputs]
234
+
235
+ indices = range(len (pitch_outputs))
236
+ confident_pitch_outputs = [ (i,p)
237
+ for i, p, c in zip(indices, pitch_outputs, confidence_outputs) if c >= 0.9 ]
238
+ confident_pitch_outputs_x, confident_pitch_outputs_y = zip(*confident_pitch_outputs)
239
+
240
+ confident_pitch_values_hz = [ output2hz(p) for p in confident_pitch_outputs_y ]
241
+
242
+
243
+ #Plot waves
244
+ fig1 = plt.figure()
245
+ plt.plot(audio_samples)
246
+
247
+ #Plot
248
+ fig2 = plot_stft(audio_samples / MAX_ABS_INT16 , sample_rate=EXPECTED_SAMPLE_RATE)
249
+
250
+ #Plot Pitch & Confidence
251
+ fig3 = plot_pitch_conf(pitch_outputs,confidence_outputs)
252
+
253
+
254
+ #Plot Pitch & Confidence Notes
255
+ fig4 = plot_pitch_conf_notes(confident_pitch_outputs_x,confident_pitch_outputs_y)
256
+
257
+ #Plot Espectro + Notes
258
+ fig5 = espectro_notas(audio_samples,EXPECTED_SAMPLE_RATE,confident_pitch_outputs_x,confident_pitch_values_hz)
259
+
260
+
261
+ # ############################################################################
262
+ # Converting to musical notes ################################################
263
+
264
+ # Now that we have the pitch values, let's convert them to notes!
265
+ # This is part is challenging by itself. We have to take into account two
266
+ # things:
267
+ # 1. the rests (when there's no singing)
268
+ # 2. the size of each note (offsets)
269
+
270
+ # ----------------------------------------------------------------------------
271
+ ### 1: Adding zeros to the output to indicate when there's no singing
272
+
273
+ pitch_outputs_and_rests = [
274
+ output2hz(p) if c >= 0.9 else 0
275
+ for i, p, c in zip(indices, pitch_outputs, confidence_outputs)
276
+ ]
277
+
278
+ # ----------------------------------------------------------------------------
279
+ ### 2: Adding note offsets
280
+ # When a person sings freely, the melody may have an offset to the absolute
281
+ # pitch values that notes can represent.
282
+ # Hence, to convert predictions to notes, one needs to correct for this
283
+ # possible offset.
284
+ # This is what the following code computes.
285
+
286
+ A4 = 440
287
+ C0 = A4 * pow(2, -4.75)
288
+ note_names = ["C", "C#", "D", "D#", "E", "F", "F#", "G", "G#", "A", "A#", "B"]
289
+
290
+ def hz2offset(freq):
291
+ # This measures the quantization error for a single note.
292
+ if freq == 0: # Rests always have zero error.
293
+ return None
294
+ # Quantized note.
295
+ h = round(12 * math.log2(freq / C0))
296
+ return 12 * math.log2(freq / C0) - h
297
+
298
+
299
+ # The ideal offset is the mean quantization error for all the notes
300
+ # (excluding rests):
301
+ offsets = [hz2offset(p) for p in pitch_outputs_and_rests if p != 0]
302
+ #print("offsets: ", offsets)
303
+ off = offsets
304
+
305
+ ideal_offset = statistics.mean(offsets)
306
+ #print("ideal offset: ", ideal_offset)
307
+ ideal_off = ideal_offset
308
+
309
+ # We can now use some heuristics to try and estimate the most likely sequence
310
+ # of notes that were sung.
311
+ # The ideal offset computed above is one ingredient - but we also need to know
312
+ # the speed (how many predictions make, say, an eighth?), and the time offset
313
+ # to start quantizing. To keep it simple, we'll just try different speeds and
314
+ # time offsets and measure the quantization error, using in the end the values
315
+ # that minimize this error.
316
+
317
+ def quantize_predictions(group, ideal_offset):
318
+ # Group values are either 0, or a pitch in Hz.
319
+ non_zero_values = [v for v in group if v != 0]
320
+ zero_values_count = len(group) - len(non_zero_values)
321
+
322
+ # Create a rest if 80% is silent, otherwise create a note.
323
+ if zero_values_count > 0.8 * len(group):
324
+ # Interpret as a rest. Count each dropped note as an error, weighted a bit
325
+ # worse than a badly sung note (which would 'cost' 0.5).
326
+ return 0.51 * len(non_zero_values), "Rest"
327
+ else:
328
+ # Interpret as note, estimating as mean of non-rest predictions.
329
+ h = round(
330
+ statistics.mean([
331
+ 12 * math.log2(freq / C0) - ideal_offset for freq in non_zero_values
332
+ ]))
333
+ octave = h // 12
334
+ n = h % 12
335
+ note = note_names[n] + str(octave)
336
+ # Quantization error is the total difference from the quantized note.
337
+ error = sum([
338
+ abs(12 * math.log2(freq / C0) - ideal_offset - h)
339
+ for freq in non_zero_values
340
+ ])
341
+ return error, note
342
+
343
+
344
+ def get_quantization_and_error(pitch_outputs_and_rests, predictions_per_eighth,
345
+ prediction_start_offset, ideal_offset):
346
+ # Apply the start offset - we can just add the offset as rests.
347
+ pitch_outputs_and_rests = [0] * prediction_start_offset + \
348
+ pitch_outputs_and_rests
349
+ # Collect the predictions for each note (or rest).
350
+ groups = [
351
+ pitch_outputs_and_rests[i:i + predictions_per_eighth]
352
+ for i in range(0, len(pitch_outputs_and_rests), predictions_per_eighth)
353
+ ]
354
+
355
+ quantization_error = 0
356
+
357
+ notes_and_rests = []
358
+ for group in groups:
359
+ error, note_or_rest = quantize_predictions(group, ideal_offset)
360
+ quantization_error += error
361
+ notes_and_rests.append(note_or_rest)
362
+
363
+ return quantization_error, notes_and_rests
364
+
365
+
366
+ best_error = float("inf")
367
+ best_notes_and_rests = None
368
+ best_predictions_per_note = None
369
+
370
+ for predictions_per_note in range(20, 65, 1):
371
+ for prediction_start_offset in range(predictions_per_note):
372
+
373
+ error, notes_and_rests = get_quantization_and_error(
374
+ pitch_outputs_and_rests, predictions_per_note,
375
+ prediction_start_offset, ideal_offset)
376
+
377
+ if error < best_error:
378
+ best_error = error
379
+ best_notes_and_rests = notes_and_rests
380
+ best_predictions_per_note = predictions_per_note
381
+
382
+ # At this point, best_notes_and_rests contains the best quantization.
383
+ # Since we don't need to have rests at the beginning, let's remove these:
384
+ #while best_notes_and_rests[0] == 'Rest':
385
+ # best_notes_and_rests = best_notes_and_rests[1:]
386
+ # Also remove silence at the end.
387
+ #while best_notes_and_rests[-1] == 'Rest':
388
+ # best_notes_and_rests = best_notes_and_rests[:-1]
389
+
390
+ # ____________________________________________________________________________
391
+ # Now let's write the quantized notes as sheet music score!
392
+ # To do it we will use two libraries: [music21](http://web.mit.edu/music21/) and
393
+ # [Open Sheet Music Display](https://github.com/opensheetmusicdisplay/opensheetmusicdisplay)
394
+ # **Note:** for simplicity, we assume here that all notes have the same duration
395
+ # (a half note).
396
+
397
+ # Creating the sheet music score.
398
+ sc = music21.stream.Score()
399
+ # Adjust the speed to match the actual singing.
400
+ bpm = 60 * 60 / best_predictions_per_note
401
+ #print ('bpm: ', bpm)
402
+ a = music21.tempo.MetronomeMark(number=bpm)
403
+ sc.insert(0,a)
404
+
405
+ for snote in best_notes_and_rests:
406
+ d = 'half'
407
+ if snote == 'Rest':
408
+ sc.append(music21.note.Rest(type=d))
409
+ else:
410
+ sc.append(music21.note.Note(snote, type=d))
411
+
412
+
413
+ # @title [Run this] Helper function to use Open Sheet Music Display (JS code)
414
+ # to show a music score
415
+ from IPython.core.display import HTML, Javascript
416
+ from IPython import display
417
+ import json, random
418
+
419
+ def showScore(score):
420
+ xml = open(score.write('musicxml')).read()
421
+ showMusicXML(xml)
422
+
423
+ def showMusicXML(xml):
424
+ DIV_ID = "OSMD_div"
425
+ a = display(HTML('<div id="'+DIV_ID+'">loading OpenSheetMusicDisplay</div>'))
426
+ script = """
427
+ var div_id = {{DIV_ID}};
428
+ function loadOSMD() {
429
+ return new Promise(function(resolve, reject){
430
+ if (window.opensheetmusicdisplay) {
431
+ return resolve(window.opensheetmusicdisplay)
432
+ }
433
+ // OSMD script has a 'define' call which conflicts with requirejs
434
+ var _define = window.define // save the define object
435
+ window.define = undefined // now the loaded script will ignore requirejs
436
+ var s = document.createElement( 'script' );
437
+ s.setAttribute( 'src', "https://cdn.jsdelivr.net/npm/opensheetmusicdisplay@0.7.6/build/opensheetmusicdisplay.min.js" );
438
+ //s.setAttribute( 'src', "/custom/opensheetmusicdisplay.js" );
439
+ s.onload=function(){
440
+ window.define = _define
441
+ resolve(opensheetmusicdisplay);
442
+ };
443
+ document.body.appendChild( s ); // browser will try to load the new script tag
444
+ })
445
+ }
446
+ loadOSMD().then((OSMD)=>{
447
+ window.openSheetMusicDisplay = new OSMD.OpenSheetMusicDisplay(div_id, {
448
+ drawingParameters: "compacttight"
449
+ });
450
+ openSheetMusicDisplay
451
+ .load({{data}})
452
+ .then(
453
+ function() {
454
+ openSheetMusicDisplay.render();
455
+ }
456
+ );
457
+ })
458
+ """.replace('{{DIV_ID}}',DIV_ID).replace('{{data}}',json.dumps(xml))
459
+ display(Javascript(script))
460
+ return a
461
+
462
+ # rendering the music score
463
+ ###partitura = showScore(sc)
464
+ #print(best_notes_and_rests)
465
+
466
+
467
+
468
+ # ____________________________________________________________________________
469
+ # Let's convert the music notes to a MIDI file and listen to it.
470
+ # To create this file, we can use the stream we created before.
471
+
472
+ # Saving the recognized musical notes as a MIDI file
473
+ ##converted_audio_file_as_midi = converted_audio_file[:-4] + '.mid'
474
+ ##fp = sc.write('midi', fp=converted_audio_file_as_midi)
475
+
476
+ ##wav_from_created_midi = converted_audio_file_as_midi.replace(' ', '_') + "_midioutput.wav"
477
+ #print(wav_from_created_midi)
478
+
479
+ # To listen to it on colab, we need to convert it back to wav. An easy way of
480
+ # doing that is using Timidity.
481
+
482
+ #!timidity $converted_audio_file_as_midi -Ow -o $wav_from_created_midi
483
+ return converted_audio_file, fig1, fig2, fig3, fig4,fig5, bpm, best_notes_and_rests#, wav_from_created_midi
484
+ #return converted_audio_file, fig1, fig2, fig3, fig4,fig5, bpm, best_notes_and_rests, partitura, wav_from_created_midi
485
+
486
+ link = "https://www.tensorflow.org/hub/tutorials/spice?hl=es-419&authuser=2"
487
+
488
+ iface = gr.Interface(
489
+ fn=main,
490
+ title= "Trabajo Práctico N°3 - Detección de tono con SPICE",
491
+ description="Implementación de Modelo con GitHub + Hugging Face🤗-- 🔊✅ " + "Basado en: " + link,
492
+ inputs = [gr.inputs.Audio(source= "microphone" , type="filepath",label="Ingrese Audio")],
493
+ outputs= [gr.outputs.Audio(label="Audio Original"),
494
+ gr.outputs.Plot(type="auto",label="Gráfico de Frecuencias"),
495
+ gr.outputs.Plot(type="auto",label="Especto"),
496
+ gr.outputs.Plot(type="auto",label="Pitch Confidence"),
497
+ gr.outputs.Plot(type="auto",label="Notas"),
498
+ gr.outputs.Plot(type="auto",label="Espectro+Notas"),
499
+ gr.outputs.Textbox(label="bpm"),
500
+ gr.outputs.Textbox(label="partitura")],#,
501
+ #gr.outputs.Textbox(type="html",label="partitura1"),
502
+ #gr.outputs.Audio(label="midi")],
503
+ interpretation = "default",
504
+ )
505
+
506
+ iface.launch(debug=True)
requirements.txt ADDED
@@ -0,0 +1,10 @@
 
 
 
 
 
 
 
 
 
 
 
1
+ numpy==1.21.2
2
+ tensorflow==2.8.0
3
+ tensorflow_hub==0.12.0
4
+ matplotlib==3.5.1
5
+ statistics==1.0.3.5
6
+ ipython==8.3.0
7
+ scipy==1.8.0
8
+ music21==7.3.3
9
+ pydub
10
+ librosa==0.9.1