ylacombe commited on
Commit
7c26b34
·
verified ·
1 Parent(s): d369927

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +1 -30
README.md CHANGED
@@ -66,7 +66,7 @@ Using Parler-TTS is as simple as "bonjour". Simply install the library once:
66
  pip install git+https://github.com/huggingface/parler-tts.git
67
  ```
68
 
69
- ### 🎲 Random voice
70
 
71
 
72
  **Parler-TTS** has been trained to generate speech with features that can be controlled with a simple text prompt, for example:
@@ -94,35 +94,6 @@ audio_arr = generation.cpu().numpy().squeeze()
94
  sf.write("parler_tts_out.wav", audio_arr, model.config.sampling_rate)
95
  ```
96
 
97
- ### 🎯 Using a specific speaker
98
-
99
- To ensure speaker consistency across generations, this checkpoint was also trained on 34 speakers, characterized by name (e.g. Jon, Lea, Gary, Jenna, Mike, Laura).
100
-
101
- To take advantage of this, simply adapt your text description to specify which speaker to use: `Jon's voice is monotone yet slightly fast in delivery, with a very close recording that almost has no background noise.`
102
-
103
- ```py
104
- import torch
105
- from parler_tts import ParlerTTSForConditionalGeneration
106
- from transformers import AutoTokenizer
107
- import soundfile as sf
108
-
109
- device = "cuda:0" if torch.cuda.is_available() else "cpu"
110
-
111
- model = ParlerTTSForConditionalGeneration.from_pretrained("parler-tts/parler-tts-mini-multilingual").to(device)
112
- tokenizer = AutoTokenizer.from_pretrained("parler-tts/parler-tts-mini-multilingual")
113
- description_tokenizer = AutoTokenizer.from_pretrained(model.config.text_encoder._name_or_path)
114
-
115
- prompt = "Hey, how are you doing today?"
116
- description = "Jon's voice is monotone yet slightly fast in delivery, with a very close recording that almost has no background noise."
117
-
118
- input_ids = description_tokenizer(description, return_tensors="pt").input_ids.to(device)
119
- prompt_input_ids = tokenizer(prompt, return_tensors="pt").input_ids.to(device)
120
-
121
- generation = model.generate(input_ids=input_ids, prompt_input_ids=prompt_input_ids)
122
- audio_arr = generation.cpu().numpy().squeeze()
123
- sf.write("parler_tts_out.wav", audio_arr, model.config.sampling_rate)
124
- ```
125
-
126
  **Tips**:
127
  * We've set up an [inference guide](https://github.com/huggingface/parler-tts/blob/main/INFERENCE.md) to make generation faster. Think SDPA, torch.compile, batching and streaming!
128
  * Include the term "very clear audio" to generate the highest quality audio, and "very noisy audio" for high levels of background noise
 
66
  pip install git+https://github.com/huggingface/parler-tts.git
67
  ```
68
 
69
+ ### Inference
70
 
71
 
72
  **Parler-TTS** has been trained to generate speech with features that can be controlled with a simple text prompt, for example:
 
94
  sf.write("parler_tts_out.wav", audio_arr, model.config.sampling_rate)
95
  ```
96
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
97
  **Tips**:
98
  * We've set up an [inference guide](https://github.com/huggingface/parler-tts/blob/main/INFERENCE.md) to make generation faster. Think SDPA, torch.compile, batching and streaming!
99
  * Include the term "very clear audio" to generate the highest quality audio, and "very noisy audio" for high levels of background noise