File size: 2,126 Bytes
797ae32
5fd5c42
 
 
 
 
797ae32
5fd5c42
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
c2079b4
 
 
 
 
 
 
 
5fd5c42
 
 
c2079b4
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
---

language: "fr"
thumbnail:
tags:
- wav2vec2
license: "apache-2.0"
---


# LeBenchmark: wav2vec2 base model trained on 1K hours of French *female-only* speech

  
LeBenchmark provides an ensemble of pretrained wav2vec2 models on different French datasets containing spontaneous, read, and broadcasted speech. 

For more information about our gender study for SSL moddels, please refer to our paper at: [A Study of Gender Impact in Self-supervised Models for Speech-to-Text Systems]()

  
## Model and data descriptions

We release four gender-specific models trained on 1K hours of speech.

- [wav2vec2-FR-1K-Male-large](https://huggingface.co/LeBenchmark/wav2vec-FR-1K-Male-large/)
- [wav2vec2-FR-1k-Male-base](https://huggingface.co/LeBenchmark/wav2vec-FR-1K-Male-base/)
- [wav2vec2-FR-1K-Female-large](https://huggingface.co/LeBenchmark/wav2vec-FR-1K-Female-large/)
- [wav2vec2-FR-1K-Female-base](https://huggingface.co/LeBenchmark/wav2vec-FR-1K-Female-base/)

## Intended uses & limitations

Pretrained wav2vec2 models are distributed under the Apache-2.0 license. Hence, they can be reused extensively without strict limitations. However, benchmarks and data may be linked to corpora that are not completely open-sourced.

## Referencing our gender-specific models
```

@article{boito2022study,

  title={A Study of Gender Impact in Self-supervised Models for Speech-to-Text Systems},

  author={Marcely Zanon Boito and Laurent Besacier and Natalia Tomashenko and Yannick Est{\`e}ve}, 

  journal={arXiv preprint arXiv:2204.01397},

  year={2022}

}

```
## Referencing LeBenchmark

```

@inproceedings{evain2021task,

  title={Task agnostic and task specific self-supervised learning from speech with \textit{LeBenchmark}},

  author={Evain, Sol{\`e}ne and Nguyen, Ha and Le, Hang and Boito, Marcely Zanon and Mdhaffar, Salima and Alisamir, Sina and Tong, Ziyi and Tomashenko, Natalia and Dinarelli, Marco and Parcollet, Titouan and others},

  booktitle={Thirty-fifth  Conference on Neural Information Processing Systems Datasets and Benchmarks Track (Round 2)},

  year={2021}

}

```