jbetker commited on
Commit
a4cda68
1 Parent(s): f499d66

getting ready for 2.1 release

Browse files
Files changed (2) hide show
  1. README.md +20 -2
  2. tortoise/api.py +1 -1
README.md CHANGED
@@ -7,6 +7,15 @@ Tortoise is a text-to-speech program built with the following priorities:
7
 
8
  This repo contains all the code needed to run Tortoise TTS in inference mode.
9
 
 
 
 
 
 
 
 
 
 
10
  ## What's in a name?
11
 
12
  I'm naming my speech-related repos after Mojave desert flora and fauna. Tortoise is a bit tongue in cheek: this model
@@ -38,7 +47,7 @@ pip install -r requirements.txt
38
 
39
  This script allows you to speak a single phrase with one or more voices.
40
  ```shell
41
- python do_tts.py --text "I'm going to speak this" --voice dotrice --preset fast
42
  ```
43
 
44
  ### read.py
@@ -46,7 +55,7 @@ python do_tts.py --text "I'm going to speak this" --voice dotrice --preset fast
46
  This script provides tools for reading large amounts of text.
47
 
48
  ```shell
49
- python read.py --textfile <your text to be read> --voice dotrice
50
  ```
51
 
52
  This will break up the textfile into sentences, and then convert them to speech one at a time. It will output a series
@@ -72,6 +81,15 @@ Tortoise was specifically trained to be a multi-speaker model. It accomplishes t
72
 
73
  These reference clips are recordings of a speaker that you provide to guide speech generation. These clips are used to determine many properties of the output, such as the pitch and tone of the voice, speaking speed, and even speaking defects like a lisp or stuttering. The reference clip is also used to determine non-voice related aspects of the audio output like volume, background noise, recording quality and reverb.
74
 
 
 
 
 
 
 
 
 
 
75
  ### Provided voices
76
 
77
  This repo comes with several pre-packaged voices. You will be familiar with many of them. :)
 
7
 
8
  This repo contains all the code needed to run Tortoise TTS in inference mode.
9
 
10
+ ### New features
11
+
12
+ #### v2.1; 2022/5/2
13
+ - Added ability to produce totally random voices.
14
+ - Added ability to download voice conditioning latent via a script, and then use a user-provided conditioning latent.
15
+ - Added ability to use your own pretrained models.
16
+ - Refactored directory structures.
17
+ - Performance improvements & bug fixes.
18
+
19
  ## What's in a name?
20
 
21
  I'm naming my speech-related repos after Mojave desert flora and fauna. Tortoise is a bit tongue in cheek: this model
 
47
 
48
  This script allows you to speak a single phrase with one or more voices.
49
  ```shell
50
+ python do_tts.py --text "I'm going to speak this" --voice random --preset fast
51
  ```
52
 
53
  ### read.py
 
55
  This script provides tools for reading large amounts of text.
56
 
57
  ```shell
58
+ python read.py --textfile <your text to be read> --voice random
59
  ```
60
 
61
  This will break up the textfile into sentences, and then convert them to speech one at a time. It will output a series
 
81
 
82
  These reference clips are recordings of a speaker that you provide to guide speech generation. These clips are used to determine many properties of the output, such as the pitch and tone of the voice, speaking speed, and even speaking defects like a lisp or stuttering. The reference clip is also used to determine non-voice related aspects of the audio output like volume, background noise, recording quality and reverb.
83
 
84
+ ### Random voice
85
+
86
+ I've included a feature which randomly generates a voice. These voices don't actually exist and will be random every time you run
87
+ it. The results are quite fascinating and I recommend you play around with it!
88
+
89
+ You can use the random voice by passing in 'random' as the voice name. Tortoise will take care of the rest.
90
+
91
+ For the those in the ML space: this is created by projecting a random vector onto the voice conditioning latent space.
92
+
93
  ### Provided voices
94
 
95
  This repo comes with several pre-packaged voices. You will be familiar with many of them. :)
tortoise/api.py CHANGED
@@ -165,7 +165,7 @@ class TextToSpeech:
165
  Main entry point into Tortoise.
166
  """
167
 
168
- def __init__(self, autoregressive_batch_size=16, models_dir='.models', enable_redaction=True):
169
  """
170
  Constructor
171
  :param autoregressive_batch_size: Specifies how many samples to generate per batch. Lower this if you are seeing
 
165
  Main entry point into Tortoise.
166
  """
167
 
168
+ def __init__(self, autoregressive_batch_size=16, models_dir='.models', enable_redaction=False):
169
  """
170
  Constructor
171
  :param autoregressive_batch_size: Specifies how many samples to generate per batch. Lower this if you are seeing