jbetker commited on
Commit
048b099
1 Parent(s): 8effe35

Update readme

Browse files
Files changed (1) hide show
  1. README.md +33 -2
README.md CHANGED
@@ -36,7 +36,32 @@ Based on [ImprovedDiffusion by openai](https://github.com/openai/improved-diffus
36
 
37
  ## How do I use this?
38
 
39
- <incoming>
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
40
 
41
  ## How do I train this?
42
 
@@ -44,4 +69,10 @@ Frankly - you don't. Building this model has been a labor of love for me, consum
44
  resources for the better part of 6 months. It uses a dataset I've gathered, refined and transcribed that consists of
45
  a lot of audio data which I cannot distribute because of copywrite or no open licenses.
46
 
47
- With that said, I'm willing to help you out if you really want to give it a shot. DM me.
 
 
 
 
 
 
 
36
 
37
  ## How do I use this?
38
 
39
+ Check out the colab: https://colab.research.google.com/drive/1wVVqUPqwiDBUVeWWOUNglpGhU3hg_cbR?usp=sharing
40
+
41
+ Or on a computer with a GPU (with >=16GB of VRAM):
42
+ ```shell
43
+ git clone https://github.com/neonbjb/tortoise-tts.git
44
+ cd tortoise-tts
45
+ pip install -r requirements.txt
46
+ python do_tts.py
47
+ ```
48
+
49
+ ## Hand-picked TTS samples
50
+
51
+ I generated ~250 samples from 23 text prompts and 8 voices. The text prompts have never been seen by the model. The
52
+ voices were pulled from the training set.
53
+
54
+ All of the samples can be found in the results/ folder of this repo.
55
+
56
+ I handpicked a few to show what the model is capable of:
57
+ [Atkins - Road not taken](results/favorites/atkins_road_not_taken.wav)
58
+ [Dotrice - Rolling Stone interview](results/favorites/dotrice_rollingstone.wav)
59
+ [Dotrice - 'Ornaments' from tacotron test set](results/favorites/dotrice_tacotron_samp1.wav)
60
+ [Kennard - 'Acute emotional intelligence' from tacotron test set](results/favorites/kennard_tacotron_samp2.wav)
61
+ [Mol - Because I could not stop for death](results/favorites/mol_dickenson.wav)
62
+ [Mol - Obama](results/favorites/mol_obama.wav)
63
+
64
+ Prosody is remarkably good for poetry, despite the fact that it was never trained on poetry.
65
 
66
  ## How do I train this?
67
 
 
69
  resources for the better part of 6 months. It uses a dataset I've gathered, refined and transcribed that consists of
70
  a lot of audio data which I cannot distribute because of copywrite or no open licenses.
71
 
72
+ With that said, I'm willing to help you out if you really want to give it a shot. DM me.
73
+
74
+ ## Looking forward
75
+
76
+ I'm not satisfied with this yet. Treat this as a "sneak peek" and check back in a couple of months. I think the concept
77
+ is sound, but there are a few hurdles to overcome to get sample quality up. I have been doing major tweaks to the
78
+ diffusion model and should have something new and much better soon.