patrickvonplaten kashif HF staff commited on
Commit
e2e5316
1 Parent(s): b4e09e8

Update README.md (#1)

Browse files

- Update README.md (e98947a936bedb698999466910f645d07da42d95)
- Update README.md (1d9603cc194e6967ce114a960717bceaad4802f9)
- Update README.md (6786dbaf90c6beefb2e82683a8818e87e4a114e3)
- Update README.md (fd6d2e9fe04af4da16345187cd030cb1646d2b91)
- Update README.md (1704e81a225d7ba1c8d03b9f2c748aebfd8fb3d7)


Co-authored-by: Kashif Rasul <kashif@users.noreply.huggingface.co>

Files changed (1) hide show
  1. README.md +9 -3
README.md CHANGED
@@ -16,15 +16,21 @@ An ideal music synthesizer should be both interactive and expressive, generating
16
 
17
  <img src="https://storage.googleapis.com/music-synthesis-with-spectrogram-diffusion/architecture.png" alt="Architecture diagram">
18
 
 
 
 
 
19
  ## Example usage
20
 
21
  ```python
22
- from diffusers import SpectrogramDiffusionPipeline
23
 
24
- pipe = SpectrogramDiffusionPipeline.from_pretrained("kashif/music-spectrogram-diffusion")
25
  pipe = pipe.to("cuda")
 
26
 
27
- output = pipe("beethoven_hammerklavier_2.mid")
 
28
 
29
  audio = output.audios[0]
30
  ```
16
 
17
  <img src="https://storage.googleapis.com/music-synthesis-with-spectrogram-diffusion/architecture.png" alt="Architecture diagram">
18
 
19
+ ## Model
20
+
21
+ As depicted above the model takes as input a MIDI file and tokenizes it into a sequence of 5 second intervals. Each tokenized interval then together with positional encodings is passed through the Note Encoder and its representation is concatenated with the previous window's generated spectrogram representation obtained via the Context Encoder. For the initial 5 second window this is set to zero. The resulting context is then used as conditioning to sample the denoised Spectrogram from the MIDI window and we concatenate this spectrogram to the final output as well as use it for the context of the next MIDI window. The process repeats till we have gone over all the MIDI inputs. Finally a MelGAN decoder converts the potentially long spectrogram to audio which is the final result of this pipeline.
22
+
23
  ## Example usage
24
 
25
  ```python
26
+ from diffusers import SpectrogramDiffusionPipeline, MidiProcessor
27
 
28
+ pipe = SpectrogramDiffusionPipeline.from_pretrained("google/music-spectrogram-diffusion")
29
  pipe = pipe.to("cuda")
30
+ processor = MidiProcessor()
31
 
32
+ # Download MIDI from: wget http://www.piano-midi.de/midis/beethoven/beethoven_hammerklavier_2.mid
33
+ output = pipe(processor("beethoven_hammerklavier_2.mid"))
34
 
35
  audio = output.audios[0]
36
  ```