facebook
/

w2v-bert-2.0

Feature Extraction

Transformers

Safetensors

wav2vec2-bert

Model card Files Files and versions Community

reach-vb HF staff commited on Dec 19, 2023

Commit

15f40b8

•

1 Parent(s): 0852d0f

Update README.md

Browse files

Files changed (1) hide show

README.md +1 -1

README.md CHANGED Viewed

@@ -105,7 +105,7 @@ We are open-sourcing our Conformer-based [W2v-BERT 2.0 speech encoder](#w2v-bert
 | Model Name        | #params | checkpoint                                                                                                                                                                                                                                                                                                                                                                 |
 | ----------------- | ------- | ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
-| W2v-BERT 2.0 | 600M    | [checkpoint](https://dl.fbaipublicfiles.com/seamless/models/conformer_shaw.pt)
 Scaling data size for self-supervised pre-training has been empirically proven to be a relatively cheap, yet effective way to improve speech representation quality (Zhang et al., 2023a). Following such direction, we continued to add more unlabeled speech data, increasing the amount of our pre-training data from 1M hours (Seamless Communication et al., 2023) to approximately 4.5M hours.
 Besides leveraging more pre-training data, we removed the random-projection quantizer (RPQ) (Chiu et al., 2022) and its associated loss previously incorporated in SeamlessM4T v1 (Seamless Communication et al., 2023).4 Akin to v1, the v2 w2v-BERT 2.0 comprises 24 Conformer layers (Gulati et al., 2020) with approximately 600M parameters and the same pre-training hyperparameters.

 | Model Name        | #params | checkpoint                                                                                                                                                                                                                                                                                                                                                                 |
 | ----------------- | ------- | ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
+| W2v-BERT 2.0 | 600M    | [checkpoint](https://huggingface.co/reach-vb/conformer-shaw/resolve/main/conformer_shaw.pt)
 Scaling data size for self-supervised pre-training has been empirically proven to be a relatively cheap, yet effective way to improve speech representation quality (Zhang et al., 2023a). Following such direction, we continued to add more unlabeled speech data, increasing the amount of our pre-training data from 1M hours (Seamless Communication et al., 2023) to approximately 4.5M hours.
 Besides leveraging more pre-training data, we removed the random-projection quantizer (RPQ) (Chiu et al., 2022) and its associated loss previously incorporated in SeamlessM4T v1 (Seamless Communication et al., 2023).4 Akin to v1, the v2 w2v-BERT 2.0 comprises 24 Conformer layers (Gulati et al., 2020) with approximately 600M parameters and the same pre-training hyperparameters.