HaNguyen commited on
Commit
ba79577
1 Parent(s): 27e8210

Update info

Browse files
Files changed (1) hide show
  1. README.md +16 -13
README.md CHANGED
@@ -10,7 +10,8 @@ license: "apache-2.0"
10
 
11
 
12
 
13
- LeBenchmark provides an ensemble of pretrained wav2vec2 models on different French dataset containing spontaneous, read and broadcasted speech. For more information on the different benchmark that can be used to evaluate the wav2vec2 models, please refer to our paper at: [Task Agnostic and Task Specific Self-Supervised Learning from Speech with LeBenchmark](https://openreview.net/pdf?id=TSvj5dmuSd)
 
14
 
15
 
16
 
@@ -18,9 +19,12 @@ LeBenchmark provides an ensemble of pretrained wav2vec2 models on different Fren
18
 
19
 
20
  We release four different models that can be found under our HuggingFace organization. Four different wav2vec2 architectures *Light*, *Base*, *Large* and *xLarge* are coupled with our small (1K), medium (3K), large (7K), and extra large (14K) corpus. In short:
 
 
21
  - [wav2vec2-FR-14K-xlarge](https://huggingface.co/LeBenchmark/wav2vec2-FR-14K-xlarge): xLarge wav2vec2 trained on 14K hours of French speech (5.4K Males / 2.4K Females / 6.8K unknown).
22
  - [wav2vec2-FR-14K-large](https://huggingface.co/LeBenchmark/wav2vec2-FR-14K-large): Large wav2vec2 trained on 14K hours of French speech (5.4K Males / 2.4K Females / 6.8K unknown).
23
  - [wav2vec2-FR-14K-light](https://huggingface.co/LeBenchmark/wav2vec2-FR-14K-light): Light wav2vec2 trained on 14K hours of French speech (5.4K Males / 2.4K Females / 6.8K unknown).
 
24
  - [wav2vec2-FR-7K-large](https://huggingface.co/LeBenchmark/wav2vec2-FR-7K-large): Large wav2vec2 trained on 7.6K hours of French speech (1.8K Males / 1.0K Females / 4.8K unknown).
25
  - [wav2vec2-FR-7K-base](https://huggingface.co/LeBenchmark/wav2vec2-FR-7K-base): Base wav2vec2 trained on 7.6K hours of French speech (1.8K Males / 1.0K Females / 4.8K unknown).
26
  - [wav2vec2-FR-3K-large](https://huggingface.co/LeBenchmark/wav2vec2-FR-3K-large): Large wav2vec2 trained on 2.9K hours of French speech (1.8K Males / 1.0K Females / 0.1K unknown).
@@ -29,11 +33,9 @@ We release four different models that can be found under our HuggingFace organiz
29
  - [wav2vec2-FR-1K-large](https://huggingface.co/LeBenchmark/wav2vec2-FR-1K-large): Large wav2vec2 trained on 1K hours of French speech (0.5K Males / 0.5K Females).
30
  - [wav2vec2-FR-1K-base](https://huggingface.co/LeBenchmark/wav2vec2-FR-1K-base): Base wav2vec2 trained on 1K hours of French speech (0.5K Males / 0.5K Females).
31
 
32
-
33
-
34
  ## Intended uses & limitations
35
 
36
- Pretrained wav2vec2 models are distributed under the apache-2.0 licence. Hence, they can be reused extensively without strict limitations. However, benchmarks and data may be linked to corpus that are not completely open-sourced.
37
 
38
  ## Fine-tune with Fairseq for ASR with CTC
39
 
@@ -43,11 +45,11 @@ Please note that due to the nature of CTC, speech-to-text results aren't expecte
43
 
44
  ## Integrate to SpeechBrain for ASR, Speaker, Source Separation ...
45
 
46
- Pretrained wav2vec models recently gained in popularity. At the same time [SpeechBrain toolkit](https://speechbrain.github.io) came out, proposing a new and simpler way of dealing with state-of-the-art speech & deep-learning technologies.
47
 
48
  While it currently is in beta, SpeechBrain offers two different ways of nicely integrating wav2vec2 models that were trained with Fairseq i.e our LeBenchmark models!
49
 
50
- 1. Extract wav2vec2 features on-the-fly (with a frozen wav2vec2 encoder) to be combined with any speech related architecture. Examples are: E2E ASR with CTC+Att+Language Models; Speaker Recognition or Verification, Source Separation ...
51
  2. *Experimental:* To fully benefit from wav2vec2, the best solution remains to fine-tune the model while you train your downstream task. This is very simply allowed within SpeechBrain as just a flag needs to be turned on. Thus, our wav2vec2 models can be fine-tuned while training your favorite ASR pipeline or Speaker Recognizer.
52
 
53
  **If interested, simply follow this [tutorial](https://colab.research.google.com/drive/17Hu1pxqhfMisjkSgmM2CnZxfqDyn2hSY?usp=sharing)**
@@ -55,11 +57,12 @@ While it currently is in beta, SpeechBrain offers two different ways of nicely i
55
  ## Referencing LeBenchmark
56
 
57
  ```
58
- @article{Evain2021LeBenchmarkAR,
59
- title={LeBenchmark: A Reproducible Framework for Assessing Self-Supervised Representation Learning from Speech},
60
- author={Sol{\`e}ne Evain and Ha Nguyen and Hang Le and Marcely Zanon Boito and Salima Mdhaffar and Sina Alisamir and Ziyi Tong and N. Tomashenko and Marco Dinarelli and Titouan Parcollet and A. Allauzen and Y. Est{\`e}ve and B. Lecouteux and F. Portet and S. Rossato and F. Ringeval and D. Schwab and L. Besacier},
61
- journal={ArXiv},
62
- year={2021},
63
- volume={abs/2104.11462}
 
64
  }
65
- ```
10
 
11
 
12
 
13
+ LeBenchmark provides an ensemble of pretrained wav2vec2 models on different French datasets containing spontaneous, read, and broadcasted speech. It comes with 2 versions, in which, the later version (LeBenchmark 2.0) is an extended version of the first version in terms of both numbers of pre-trained SSL models, and numbers of downstream tasks.
14
+ For more information on the different benchmarks that can be used to evaluate the wav2vec2 models, please refer to our paper at: [LeBenchmark 2.0: a Standardized, Replicable and Enhanced Framework for Self-supervised Representations of French Speech](https://arxiv.org/abs/2309.05472)
15
 
16
 
17
 
19
 
20
 
21
  We release four different models that can be found under our HuggingFace organization. Four different wav2vec2 architectures *Light*, *Base*, *Large* and *xLarge* are coupled with our small (1K), medium (3K), large (7K), and extra large (14K) corpus. In short:
22
+
23
+ ## *Lebenchmark 2.0:*
24
  - [wav2vec2-FR-14K-xlarge](https://huggingface.co/LeBenchmark/wav2vec2-FR-14K-xlarge): xLarge wav2vec2 trained on 14K hours of French speech (5.4K Males / 2.4K Females / 6.8K unknown).
25
  - [wav2vec2-FR-14K-large](https://huggingface.co/LeBenchmark/wav2vec2-FR-14K-large): Large wav2vec2 trained on 14K hours of French speech (5.4K Males / 2.4K Females / 6.8K unknown).
26
  - [wav2vec2-FR-14K-light](https://huggingface.co/LeBenchmark/wav2vec2-FR-14K-light): Light wav2vec2 trained on 14K hours of French speech (5.4K Males / 2.4K Females / 6.8K unknown).
27
+ ## *Lebenchmark:*
28
  - [wav2vec2-FR-7K-large](https://huggingface.co/LeBenchmark/wav2vec2-FR-7K-large): Large wav2vec2 trained on 7.6K hours of French speech (1.8K Males / 1.0K Females / 4.8K unknown).
29
  - [wav2vec2-FR-7K-base](https://huggingface.co/LeBenchmark/wav2vec2-FR-7K-base): Base wav2vec2 trained on 7.6K hours of French speech (1.8K Males / 1.0K Females / 4.8K unknown).
30
  - [wav2vec2-FR-3K-large](https://huggingface.co/LeBenchmark/wav2vec2-FR-3K-large): Large wav2vec2 trained on 2.9K hours of French speech (1.8K Males / 1.0K Females / 0.1K unknown).
33
  - [wav2vec2-FR-1K-large](https://huggingface.co/LeBenchmark/wav2vec2-FR-1K-large): Large wav2vec2 trained on 1K hours of French speech (0.5K Males / 0.5K Females).
34
  - [wav2vec2-FR-1K-base](https://huggingface.co/LeBenchmark/wav2vec2-FR-1K-base): Base wav2vec2 trained on 1K hours of French speech (0.5K Males / 0.5K Females).
35
 
 
 
36
  ## Intended uses & limitations
37
 
38
+ Pretrained wav2vec2 models are distributed under the Apache-2.0 license. Hence, they can be reused extensively without strict limitations. However, benchmarks and data may be linked to corpora that are not completely open-sourced.
39
 
40
  ## Fine-tune with Fairseq for ASR with CTC
41
 
45
 
46
  ## Integrate to SpeechBrain for ASR, Speaker, Source Separation ...
47
 
48
+ Pretrained wav2vec models recently gained in popularity. At the same time, [SpeechBrain toolkit](https://speechbrain.github.io) came out, proposing a new and simpler way of dealing with state-of-the-art speech & deep-learning technologies.
49
 
50
  While it currently is in beta, SpeechBrain offers two different ways of nicely integrating wav2vec2 models that were trained with Fairseq i.e our LeBenchmark models!
51
 
52
+ 1. Extract wav2vec2 features on-the-fly (with a frozen wav2vec2 encoder) to be combined with any speech-related architecture. Examples are: E2E ASR with CTC+Att+Language Models; Speaker Recognition or Verification, Source Separation ...
53
  2. *Experimental:* To fully benefit from wav2vec2, the best solution remains to fine-tune the model while you train your downstream task. This is very simply allowed within SpeechBrain as just a flag needs to be turned on. Thus, our wav2vec2 models can be fine-tuned while training your favorite ASR pipeline or Speaker Recognizer.
54
 
55
  **If interested, simply follow this [tutorial](https://colab.research.google.com/drive/17Hu1pxqhfMisjkSgmM2CnZxfqDyn2hSY?usp=sharing)**
57
  ## Referencing LeBenchmark
58
 
59
  ```
60
+ @misc{parcollet2023lebenchmark,
61
+ title={LeBenchmark 2.0: a Standardized, Replicable and Enhanced Framework for Self-supervised Representations of French Speech},
62
+ author={Titouan Parcollet and Ha Nguyen and Solene Evain and Marcely Zanon Boito and Adrien Pupier and Salima Mdhaffar and Hang Le and Sina Alisamir and Natalia Tomashenko and Marco Dinarelli and Shucong Zhang and Alexandre Allauzen and Maximin Coavoux and Yannick Esteve and Mickael Rouvier and Jerome Goulian and Benjamin Lecouteux and Francois Portet and Solange Rossato and Fabien Ringeval and Didier Schwab and Laurent Besacier},
63
+ year={2023},
64
+ eprint={2309.05472},
65
+ archivePrefix={arXiv},
66
+ primaryClass={cs.CL}
67
  }
68
+ ```