ylacombe commited on
Commit
cb90476
1 Parent(s): c0ab532

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +4 -7
README.md CHANGED
@@ -24,7 +24,7 @@ This is the "large" variant of the unified model, which enables multiple tasks w
24
  - Text-to-text translation (T2TT)
25
  - Automatic speech recognition (ASR)
26
 
27
- You can perform all the above tasks from one single model - `SeamlessM4TModel`, but each task also has its own dedicated sub-model.
28
 
29
 
30
  ## 🤗 Usage
@@ -42,7 +42,7 @@ You can seamlessly use this model on text or on audio, to generated either trans
42
 
43
  ### Speech
44
 
45
- You can easily generate translated speech with [`SeamlessM4TModel.generate`]. Here is an example showing how to generate speech from English to Russian.
46
 
47
  ```python
48
  inputs = processor(text = "Hello, my dog is cute", src_lang="eng", return_tensors="pt")
@@ -57,9 +57,7 @@ You can also translate directly from a speech waveform. Here is an example from
57
  from datasets import load_dataset
58
 
59
  dataset = load_dataset("arabic_speech_corpus", split="test[0:1]")
60
-
61
  audio_sample = dataset["audio"][0]["array"]
62
-
63
  inputs = processor(audios = audio_sample, return_tensors="pt")
64
 
65
  audio_array = model.generate(**inputs, tgt_lang="rus")
@@ -86,7 +84,7 @@ scipy.io.wavfile.write("seamless_m4t_out.wav", rate=sampling_rate, data=audio_ar
86
 
87
  #### Tips
88
 
89
- [`SeamlessM4TModel`] is transformers top level model to generate speech and text, but you can also use dedicated models that perform the task without additional components, thus reducing the memory footprint.
90
  For example, you can replace the previous snippet with the model dedicated to the S2ST task:
91
 
92
  ```python
@@ -103,7 +101,6 @@ Similarly, you can generate translated text from text or audio files, this time
103
  from transformers import SeamlessM4TForSpeechToText
104
  model = SeamlessM4TForSpeechToText.from_pretrained("ylacombe/hf-seamless-m4t-medium")
105
  audio_sample = dataset["audio"][0]["array"]
106
-
107
  inputs = processor(audios = audio_sample, return_tensors="pt")
108
 
109
  output_tokens = model.generate(**inputs, tgt_lang="fra")
@@ -125,7 +122,7 @@ translated_text = processor.decode(output_tokens.tolist()[0], skip_special_token
125
 
126
  Three last tips:
127
 
128
- 1. [`SeamlessM4TModel`] can generate text and/or speech. Pass `generate_speech=False` to [`SeamlessM4TModel.generate`] to only generate text. You also have the possibility to pass `return_intermediate_token_ids=True`, to get both text token ids and the generated speech.
129
  2. You have the possibility to change the speaker used for speech synthesis with the `spkr_id` argument.
130
  3. You can use different [generation strategies](./generation_strategies) for speech and text generation, e.g `.generate(input_ids=input_ids, text_num_beams=4, speech_do_sample=True)` which will successively perform beam-search decoding on the text model, and multinomial sampling on the speech model.
131
 
 
24
  - Text-to-text translation (T2TT)
25
  - Automatic speech recognition (ASR)
26
 
27
+ You can perform all the above tasks from one single model, [`SeamlessM4TModel`](https://moon-ci-docs.huggingface.co/docs/transformers/pr_25693/en/model_doc/seamless_m4t#transformers.SeamlessM4TModel), but each task also has its own dedicated sub-model.
28
 
29
 
30
  ## 🤗 Usage
 
42
 
43
  ### Speech
44
 
45
+ You can easily generate translated speech with [`SeamlessM4TModel.generate`](https://moon-ci-docs.huggingface.co/docs/transformers/pr_25693/en/model_doc/seamless_m4t#transformers.SeamlessM4TModel.generate). Here is an example showing how to generate speech from English to Russian.
46
 
47
  ```python
48
  inputs = processor(text = "Hello, my dog is cute", src_lang="eng", return_tensors="pt")
 
57
  from datasets import load_dataset
58
 
59
  dataset = load_dataset("arabic_speech_corpus", split="test[0:1]")
 
60
  audio_sample = dataset["audio"][0]["array"]
 
61
  inputs = processor(audios = audio_sample, return_tensors="pt")
62
 
63
  audio_array = model.generate(**inputs, tgt_lang="rus")
 
84
 
85
  #### Tips
86
 
87
+ [`SeamlessM4TModel`](https://moon-ci-docs.huggingface.co/docs/transformers/pr_25693/en/model_doc/seamless_m4t#transformers.SeamlessM4TModel) is transformers top level model to generate speech and text, but you can also use dedicated models that perform the task without additional components, thus reducing the memory footprint.
88
  For example, you can replace the previous snippet with the model dedicated to the S2ST task:
89
 
90
  ```python
 
101
  from transformers import SeamlessM4TForSpeechToText
102
  model = SeamlessM4TForSpeechToText.from_pretrained("ylacombe/hf-seamless-m4t-medium")
103
  audio_sample = dataset["audio"][0]["array"]
 
104
  inputs = processor(audios = audio_sample, return_tensors="pt")
105
 
106
  output_tokens = model.generate(**inputs, tgt_lang="fra")
 
122
 
123
  Three last tips:
124
 
125
+ 1. [`SeamlessM4TModel`](https://moon-ci-docs.huggingface.co/docs/transformers/pr_25693/en/model_doc/seamless_m4t#transformers.SeamlessM4TModel) can generate text and/or speech. Pass `generate_speech=False` to [`SeamlessM4TModel.generate`](https://moon-ci-docs.huggingface.co/docs/transformers/pr_25693/en/model_doc/seamless_m4t#transformers.SeamlessM4TModel.generate) to only generate text. You also have the possibility to pass `return_intermediate_token_ids=True`, to get both text token ids and the generated speech.
126
  2. You have the possibility to change the speaker used for speech synthesis with the `spkr_id` argument.
127
  3. You can use different [generation strategies](./generation_strategies) for speech and text generation, e.g `.generate(input_ids=input_ids, text_num_beams=4, speech_do_sample=True)` which will successively perform beam-search decoding on the text model, and multinomial sampling on the speech model.
128