ylacombe HF staff commited on
Commit
1cba099
β€’
1 Parent(s): cbfe2cd

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +9 -9
README.md CHANGED
@@ -12,7 +12,7 @@ library_name: transformers
12
  SeamlessM4T is a collection of models designed to provide high quality translation, allowing people from different
13
  linguistic communities to communicate effortlessly through speech and text.
14
 
15
- This repository hosts πŸ€— Hugging Face's [implementation](https://moon-ci-docs.huggingface.co/docs/transformers/pr_25693/en/model_doc/seamless_m4t) of SeamlessM4T. You can find the original weights, as well as a guide on how to run them in the original hub repositories ([large](https://huggingface.co/facebook/seamless-m4t-large) and [medium](https://huggingface.co/facebook/seamless-m4t-medium) checkpoints).
16
 
17
  SeamlessM4T Medium covers:
18
  - πŸ“₯ 101 languages for speech input
@@ -26,7 +26,7 @@ This is the "medium" variant of the unified model, which enables multiple tasks
26
  - Text-to-text translation (T2TT)
27
  - Automatic speech recognition (ASR)
28
 
29
- You can perform all the above tasks from one single model, [`SeamlessM4TModel`](https://moon-ci-docs.huggingface.co/docs/transformers/pr_25693/en/model_doc/seamless_m4t#transformers.SeamlessM4TModel), but each task also has its own dedicated sub-model.
30
 
31
 
32
  ## πŸ€— Usage
@@ -60,7 +60,7 @@ Here is how to use the processor to process text and audio:
60
 
61
  ### Speech
62
 
63
- [`SeamlessM4TModel`] can *seamlessly* generate text or speech with few or no changes. Let's target Russian voice translation:
64
 
65
  ```python
66
  >>> audio_array_from_text = model.generate(**text_inputs, tgt_lang="rus")[0].cpu().numpy().squeeze()
@@ -71,7 +71,7 @@ With basically the same code, I've translated English text and Arabic speech to
71
 
72
  ### Text
73
 
74
- Similarly, you can generate translated text from audio files or from text with the same model. You only have to pass `generate_speech=False` to [`SeamlessM4TModel.generate`].
75
  This time, let's translate to French.
76
 
77
  ```python
@@ -89,7 +89,7 @@ This time, let's translate to French.
89
 
90
  #### 1. Use dedicated models
91
 
92
- [`SeamlessM4TModel`] is transformers top level model to generate speech and text, but you can also use dedicated models that perform the task without additional components, thus reducing the memory footprint.
93
  For example, you can replace the audio-to-audio generation snippet with the model dedicated to the S2ST task, the rest is exactly the same code:
94
 
95
  ```python
@@ -104,16 +104,16 @@ Or you can replace the text-to-text generation snippet with the model dedicated
104
  >>> model = SeamlessM4TForTextToText.from_pretrained("facebook/hf-seamless-m4t-medium")
105
  ```
106
 
107
- Feel free to try out [`SeamlessM4TForSpeechToText`] and [`SeamlessM4TForTextToSpeech`] as well.
108
 
109
  #### 2. Change the speaker identity
110
 
111
  You have the possibility to change the speaker used for speech synthesis with the `spkr_id` argument. Some `spkr_id` works better than other for some languages!
112
 
113
- #### 3. Change the speaker identity
114
 
115
- You can use different [generation strategies](./generation_strategies) for speech and text generation, e.g `.generate(input_ids=input_ids, text_num_beams=4, speech_do_sample=True)` which will successively perform beam-search decoding on the text model, and multinomial sampling on the speech model.
116
 
117
  #### 4. Generate speech and text at the same time
118
 
119
- Use `return_intermediate_token_ids=True` with [`SeamlessM4TModel`] to return both speech and text !
 
12
  SeamlessM4T is a collection of models designed to provide high quality translation, allowing people from different
13
  linguistic communities to communicate effortlessly through speech and text.
14
 
15
+ This repository hosts πŸ€— Hugging Face's [implementation](https://huggingface.co/docs/transformers/main/en/model_doc/seamless_m4t) of SeamlessM4T. You can find the original weights, as well as a guide on how to run them in the original hub repositories ([large](https://huggingface.co/facebook/seamless-m4t-large) and [medium](https://huggingface.co/facebook/seamless-m4t-medium) checkpoints).
16
 
17
  SeamlessM4T Medium covers:
18
  - πŸ“₯ 101 languages for speech input
 
26
  - Text-to-text translation (T2TT)
27
  - Automatic speech recognition (ASR)
28
 
29
+ You can perform all the above tasks from one single model, [`SeamlessM4TModel`](https://huggingface.co/docs/transformers/main/en/model_doc/seamless_m4t#transformers.SeamlessM4TModel), but each task also has its own dedicated sub-model.
30
 
31
 
32
  ## πŸ€— Usage
 
60
 
61
  ### Speech
62
 
63
+ [`SeamlessM4TModel`](https://huggingface.co/docs/transformers/main/en/model_doc/seamless_m4t#transformers.SeamlessM4TModel) can *seamlessly* generate text or speech with few or no changes. Let's target Russian voice translation:
64
 
65
  ```python
66
  >>> audio_array_from_text = model.generate(**text_inputs, tgt_lang="rus")[0].cpu().numpy().squeeze()
 
71
 
72
  ### Text
73
 
74
+ Similarly, you can generate translated text from audio files or from text with the same model. You only have to pass `generate_speech=False` to [`SeamlessM4TModel.generate`](https://huggingface.co/docs/transformers/main/en/model_doc/seamless_m4t#transformers.SeamlessM4TModel.generate).
75
  This time, let's translate to French.
76
 
77
  ```python
 
89
 
90
  #### 1. Use dedicated models
91
 
92
+ [`SeamlessM4TModel`](https://huggingface.co/docs/transformers/main/en/model_doc/seamless_m4t#transformers.SeamlessM4TModel) is transformers top level model to generate speech and text, but you can also use dedicated models that perform the task without additional components, thus reducing the memory footprint.
93
  For example, you can replace the audio-to-audio generation snippet with the model dedicated to the S2ST task, the rest is exactly the same code:
94
 
95
  ```python
 
104
  >>> model = SeamlessM4TForTextToText.from_pretrained("facebook/hf-seamless-m4t-medium")
105
  ```
106
 
107
+ Feel free to try out [`SeamlessM4TForSpeechToText`](https://huggingface.co/docs/transformers/main/en/model_doc/seamless_m4t#transformers.SeamlessM4TForSpeechToText) and [`SeamlessM4TForTextToSpeech`](https://huggingface.co/docs/transformers/main/en/model_doc/seamless_m4t#transformers.SeamlessM4TForTextToSpeech) as well.
108
 
109
  #### 2. Change the speaker identity
110
 
111
  You have the possibility to change the speaker used for speech synthesis with the `spkr_id` argument. Some `spkr_id` works better than other for some languages!
112
 
113
+ #### 3. Change the generation strategy
114
 
115
+ You can use different [generation strategies](https://huggingface.co/docs/transformers/v4.34.1/en/generation_strategies#text-generation-strategies) for speech and text generation, e.g `.generate(input_ids=input_ids, text_num_beams=4, speech_do_sample=True)` which will successively perform beam-search decoding on the text model, and multinomial sampling on the speech model.
116
 
117
  #### 4. Generate speech and text at the same time
118
 
119
+ Use `return_intermediate_token_ids=True` with [`SeamlessM4TModel`](https://huggingface.co/docs/transformers/main/en/model_doc/seamless_m4t#transformers.SeamlessM4TModel) to return both speech and text !