File size: 2,172 Bytes
2b6b92e
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
4a34060
 
 
 
 
 
 
2b6b92e
 
 
 
 
 
 
 
 
 
 
c2079b4
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
---
language: "fr"
thumbnail:
tags:
- wav2vec2
license: "apache-2.0"
---

# LeBenchmark: wav2vec2 base model trained on 1K hours of French *female-only* speech

  
LeBenchmark provides an ensemble of pretrained wav2vec2 models on different French datasets containing spontaneous, read, and broadcasted speech. 

For more information about our gender study for SSL moddels, please refer to our paper at: [A Study of Gender Impact in Self-supervised Models for Speech-to-Text Systems](https://arxiv.org/abs/2204.01397)

  
## Model and data descriptions

We release four gender-specific models trained on 1K hours of speech.

- [wav2vec2-FR-1K-Male-large](https://huggingface.co/LeBenchmark/wav2vec-FR-1K-Male-large/)
- [wav2vec2-FR-1k-Male-base](https://huggingface.co/LeBenchmark/wav2vec-FR-1K-Male-base/)
- [wav2vec2-FR-1K-Female-large](https://huggingface.co/LeBenchmark/wav2vec-FR-1K-Female-large/)
- [wav2vec2-FR-1K-Female-base](https://huggingface.co/LeBenchmark/wav2vec-FR-1K-Female-base/)

## Intended uses & limitations

Pretrained wav2vec2 models are distributed under the Apache-2.0 license. Hence, they can be reused extensively without strict limitations. However, benchmarks and data may be linked to corpora that are not completely open-sourced.

## Referencing our gender-specific models
```
@inproceedings{boito22_interspeech,
  author={Marcely Zanon Boito and Laurent Besacier and Natalia Tomashenko and Yannick Estève},
  title={{A Study of Gender Impact in Self-supervised Models for Speech-to-Text Systems}},
  year=2022,
  booktitle={Proc. Interspeech 2022},
  pages={1278--1282},
  doi={10.21437/Interspeech.2022-353}
}
```
## Referencing LeBenchmark

```
@inproceedings{evain2021task,
  title={Task agnostic and task specific self-supervised learning from speech with \textit{LeBenchmark}},
  author={Evain, Sol{\`e}ne and Nguyen, Ha and Le, Hang and Boito, Marcely Zanon and Mdhaffar, Salima and Alisamir, Sina and Tong, Ziyi and Tomashenko, Natalia and Dinarelli, Marco and Parcollet, Titouan and others},
  booktitle={Thirty-fifth  Conference on Neural Information Processing Systems Datasets and Benchmarks Track (Round 2)},
  year={2021}
}
```