ylacombe HF staff reach-vb HF staff commited on
Commit
39b288d
1 Parent(s): 47ef28a

Update the order of code and add pipeline usage! (#22)

Browse files

- Update the order of code and add pipeline usage! (578a8302e3794675594bb02d971245bb7cf5690e)


Co-authored-by: Vaibhav Srivastav <reach-vb@users.noreply.huggingface.co>

Files changed (1) hide show
  1. README.md +48 -26
README.md CHANGED
@@ -47,44 +47,35 @@ Extensive evaluations show the superiority of the proposed SpeechT5 framework on
47
 
48
  <!-- Address questions around how the model is intended to be used, including the foreseeable users of the model and those affected by the model. -->
49
 
50
- ## Direct Use
51
-
52
- <!-- This section is for the model use without fine-tuning or plugging into a larger ecosystem/app. -->
53
-
54
- You can use this model for speech synthesis. See the [model hub](https://huggingface.co/models?search=speecht5) to look for fine-tuned versions on a task that interests you.
55
-
56
- ## Downstream Use [optional]
57
-
58
- <!-- This section is for the model use when fine-tuned for a task, or when plugged into a larger ecosystem/app -->
59
-
60
- [More Information Needed]
61
-
62
- ## Out-of-Scope Use
63
-
64
- <!-- This section addresses misuse, malicious use, and uses that the model will not work well for. -->
65
-
66
- [More Information Needed]
67
 
68
- # Bias, Risks, and Limitations
69
 
70
- <!-- This section is meant to convey both technical and sociotechnical limitations. -->
 
 
71
 
72
- [More Information Needed]
 
 
73
 
74
- ## Recommendations
75
 
76
- <!-- This section is meant to convey recommendations with respect to the bias, risk, and technical limitations. -->
 
 
77
 
78
- Users (both direct and downstream) should be made aware of the risks, biases and limitations of the model. More information needed for further recommendations.
79
 
 
80
 
81
- ## How to Get Started With the Model
82
 
83
- Use the code below to convert text into a mono 16 kHz speech waveform.
84
 
85
  ```python
86
  # Following pip packages need to be installed:
87
- # !pip install git+https://github.com/huggingface/transformers sentencepiece datasets
88
 
89
  from transformers import SpeechT5Processor, SpeechT5ForTextToSpeech, SpeechT5HifiGan
90
  from datasets import load_dataset
@@ -111,6 +102,37 @@ sf.write("speech.wav", speech.numpy(), samplerate=16000)
111
 
112
  Refer to [this Colab notebook](https://colab.research.google.com/drive/1i7I5pzBcU3WDFarDnzweIj4-sVVoIUFJ) for an example of how to fine-tune SpeechT5 for TTS on a different dataset or a new language.
113
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
114
  # Training Details
115
 
116
  ## Training Data
 
47
 
48
  <!-- Address questions around how the model is intended to be used, including the foreseeable users of the model and those affected by the model. -->
49
 
50
+ ## How to Get Started With the Model
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
51
 
52
+ You can access the SpeechT5 model via the `Text-to-Speech` pipeline in just a couple lines of code!
53
 
54
+ ```python
55
+ # Following pip packages need to be installed:
56
+ # !pip install transformers sentencepiece datasets
57
 
58
+ from transformers import pipeline
59
+ from datasets import load_dataset
60
+ import soundfile as sf
61
 
62
+ synthesiser = pipeline("text-to-speech", "microsoft/speech_tt5")
63
 
64
+ embeddings_dataset = load_dataset("Matthijs/cmu-arctic-xvectors", split="validation")
65
+ speaker_embeddings = torch.tensor(embeddings_dataset[7306]["xvector"]).unsqueeze(0)
66
+ # You can replace this embedding with your own as well.
67
 
68
+ speech = pipe("Hello what is happening", forward_params={"speaker_embeddings": speaker_embeddings})
69
 
70
+ sf.write("speech.wav", speech["audio"], samplerate=speech["sampling_rate"])
71
 
72
+ ```
73
 
74
+ For more fine-grained control you can use the processor + generate code to convert text into a mono 16 kHz speech waveform.
75
 
76
  ```python
77
  # Following pip packages need to be installed:
78
+ # !pip install transformers sentencepiece datasets
79
 
80
  from transformers import SpeechT5Processor, SpeechT5ForTextToSpeech, SpeechT5HifiGan
81
  from datasets import load_dataset
 
102
 
103
  Refer to [this Colab notebook](https://colab.research.google.com/drive/1i7I5pzBcU3WDFarDnzweIj4-sVVoIUFJ) for an example of how to fine-tune SpeechT5 for TTS on a different dataset or a new language.
104
 
105
+
106
+ ## Direct Use
107
+
108
+ <!-- This section is for the model use without fine-tuning or plugging into a larger ecosystem/app. -->
109
+
110
+ You can use this model for speech synthesis. See the [model hub](https://huggingface.co/models?search=speecht5) to look for fine-tuned versions on a task that interests you.
111
+
112
+ ## Downstream Use [optional]
113
+
114
+ <!-- This section is for the model use when fine-tuned for a task, or when plugged into a larger ecosystem/app -->
115
+
116
+ [More Information Needed]
117
+
118
+ ## Out-of-Scope Use
119
+
120
+ <!-- This section addresses misuse, malicious use, and uses that the model will not work well for. -->
121
+
122
+ [More Information Needed]
123
+
124
+ # Bias, Risks, and Limitations
125
+
126
+ <!-- This section is meant to convey both technical and sociotechnical limitations. -->
127
+
128
+ [More Information Needed]
129
+
130
+ ## Recommendations
131
+
132
+ <!-- This section is meant to convey recommendations with respect to the bias, risk, and technical limitations. -->
133
+
134
+ Users (both direct and downstream) should be made aware of the risks, biases and limitations of the model. More information needed for further recommendations.
135
+
136
  # Training Details
137
 
138
  ## Training Data