irow commited on
Commit
3113c33
1 Parent(s): 4f3da47

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +5 -1
README.md CHANGED
@@ -4,12 +4,16 @@ datasets:
4
  pipeline_tag: text-to-speech
5
  ---
6
 
7
- This is a basic audio diffusion model using Unet. I've uploaded the weights and training code. The sample method of the model is used to generate whatever spoken digit you want.
 
 
 
8
  ![alt text](sample24_4_6.jpg "Title") ![alt text]( sample24_5_5.jpg
9
  "Title") ![alt text]( sample24_6_3.jpg
10
  "Title") ![alt text]( sample24_7_2.jpg
11
  "Title")
12
 
 
13
  The images found in the files are sample{epoch}_{sample#}_{digit}.jpg. They also have corresponding audio files.
14
  The audio is VERY quiet, so turn up the speakers to hear better. (Just don't forget to turn it down after!)
15
 
 
4
  pipeline_tag: text-to-speech
5
  ---
6
 
7
+ This is a basic audio diffusion model using Unet. I've uploaded the weights and training code.
8
+ The sample method of the model is used to generate whatever spoken digit you want.
9
+ I used the awesome code provided by HuggingFace audio diffusers to generate Mel-spectrograms which were then used to train the model.
10
+ For the model code I used the denoising-diffusion-pytorch repo found at https://github.com/lucidrains/denoising-diffusion-pytorch
11
  ![alt text](sample24_4_6.jpg "Title") ![alt text]( sample24_5_5.jpg
12
  "Title") ![alt text]( sample24_6_3.jpg
13
  "Title") ![alt text]( sample24_7_2.jpg
14
  "Title")
15
 
16
+
17
  The images found in the files are sample{epoch}_{sample#}_{digit}.jpg. They also have corresponding audio files.
18
  The audio is VERY quiet, so turn up the speakers to hear better. (Just don't forget to turn it down after!)
19