Antoine-caubriere gauthelo commited on
Commit
f1153ad
1 Parent(s): ecaafd2

Add model card (#1)

Browse files

- Add model card (2964fa30bb5b0fab631e54fe160192bea83cbc68)


Co-authored-by: Elodie Gauthier <gauthelo@users.noreply.huggingface.co>

Files changed (1) hide show
  1. README.md +84 -3
README.md CHANGED
@@ -1,3 +1,84 @@
1
- ---
2
- license: cc-by-nc-4.0
3
- ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ license: cc-by-nc-4.0
3
+ metrics:
4
+ - cer
5
+ - wer
6
+ library_name: speechbrain
7
+ pipeline_tag: automatic-speech-recognition
8
+ tags:
9
+ - speech processing
10
+ - self-supervision
11
+ - african languages
12
+ - fine-tuning
13
+ ---
14
+ ## Model description
15
+ This self-supervised speech model (a.k.a. SSA-HuBERT-base-60k) is based on a HuBERT Base architecture (~95M params) [1].
16
+ It was trained on nearly 60 000 hours of speech segments and covers 21 languages and variants spoken in Sub-Saharan Africa.
17
+
18
+ ### Pretraining data
19
+ - Dataset: The training dataset was composed of both studio recordings (controlled environment, prepared talks) and street interviews (noisy environment, spontaneous speech).
20
+
21
+ - Languages: Bambara (bam), Dyula (dyu), French (fra), Fula (ful), Fulfulde (ffm), Fulfulde (fuh), Gulmancema (gux), Hausa (hau), Kinyarwanda (kin), Kituba (ktu), Lingala (lin), Luba-Lulua (lua), Mossi (mos), Maninkakan (mwk), Sango (sag), Songhai (son), Swahili (swc), Swahili (swh), Tamasheq (taq), Wolof (wol), Zarma (dje).
22
+
23
+ ## ASR fine-tuning
24
+ The SpeechBrain toolkit (Ravanelli et al., 2021) is used to fine-tune the model.
25
+ Fine-tuning is done for each language using the FLEURS dataset [2].
26
+ The pretrained model (SSA-HuBERT-base-60k) is considered as a speech encoder and is fully fine-tuned with two 1024 linear layers and a softmax output at the top.
27
+
28
+ ## License
29
+ This model is released under the CC-by-NC 4.0 conditions.
30
+
31
+ ## Publication
32
+ This model were presented at AfricaNLP 2024.
33
+ The associated paper is available here: [Africa-Centric Self-Supervised Pre-Training for Multilingual Speech Representation in a Sub-Saharan Context](https://openreview.net/forum?id=zLOhcft2E7)
34
+
35
+ ### Citation
36
+ Please cite our paper when using SSA-HuBERT-base-60k model:
37
+
38
+ Caubrière, A., & Gauthier, E. (2024). Africa-Centric Self-Supervised Pre-Training for Multilingual Speech Representation in a Sub-Saharan Context. In 5th Workshop on African Natural Language Processing (AfricaNLP 2024).
39
+
40
+ **Bibtex citation:**
41
+ @inproceedings{caubri{\`e}re2024ssaspeechssl,
42
+ title={Africa-Centric Self-Supervised Pretraining for Multilingual Speech Representation in a Sub-Saharan Context},
43
+ author={Antoine Caubri{\`e}re and Elodie Gauthier},
44
+ booktitle={5th Workshop on African Natural Language Processing},
45
+ year={2024},
46
+ url={https://openreview.net/forum?id=zLOhcft2E7}}
47
+
48
+
49
+ ## Results
50
+ The following results are obtained in a greedy mode (no language model rescoring).
51
+ Character error rates (CERs) and Word error rates (WERs) are given in the table below, on the 20 languages of the SSA subpart of the FLEURS dataset.
52
+
53
+ | **Language** | **CER** | **WER** |
54
+ | :----------------- | :--------- | :--------- |
55
+ | **Afrikaans** | 23.3 | 68.4 |
56
+ | **Amharic** | 15.9 | 52.7 |
57
+ | **Fula** | 21.2 | 61.9 |
58
+ | **Ganda** | 11.5 | 52.8 |
59
+ | **Hausa** | 10.5 | 32.5 |
60
+ | **Igbo** | 19.7 | 57.5 |
61
+ | **Kamba** | 16.1 | 53.9 |
62
+ | **Lingala** | 8.7 | 24.7 |
63
+ | **Luo** | 9.9 | 38.9 |
64
+ | **Northen-Sotho** | 13.5 | 43.2 |
65
+ | **Nyanja** | 13.3 | 54.2 |
66
+ | **Oromo** | 22.8 | 78.1 |
67
+ | **Shona** | 11.6 | 50.2 |
68
+ | **Somali** | 21.6 | 64.9 |
69
+ | **Swahili** | 7.1 | 23.8 |
70
+ | **Umbundu** | 21.7 | 61.7 |
71
+ | **Wolof** | 19.4 | 55.0 |
72
+ | **Xhosa** | 11.9 | 51.6 |
73
+ | **Yoruba** | 24.3 | 67.5 |
74
+ | **Zulu** | 12.2 | 53.4 |
75
+ | *Overall average* | *15.8* | *52.3* |
76
+
77
+
78
+ ## Reproductibilty
79
+ We propose a notebook to reproduce the ASR experiments mentioned in our paper. See `SB_ASR_FLEURS_finetuning.ipynb`.
80
+ By using the `ASR_FLEURS-swahili_hf.yaml` config file, you will be able to run the recipe on Swahili.
81
+
82
+ ## References
83
+ [1] Wei-Ning Hsu, Benjamin Bolte, Yao-Hung Hubert Tsai, Kushal Lakhotia, Ruslan Salakhutdinov, and Abdelrahman Mohamed. HuBERT: Self-Supervised Speech Representation Learning by Masked Prediction of Hidden Units. In 2021 IEEE/ACM Transactions on Audio, Speech, and Language Processing, vol. 29, pp.3451–3460, 2021. doi: 10.1109/TASLP.2021.3122291.
84
+ [2] Alexis Conneau, Min Ma, Simran Khanuja, Yu Zhang, Vera Axelrod, Siddharth Dalmia, Jason Riesa, Clara Rivera, and Ankur Bapna. Fleurs: Few-shot learning evaluation of universal representations of speech. In 2022 IEEE Spoken Language Technology Workshop (SLT), pp. 798–805, 2022. doi: 10.1109/SLT54892.2023.10023141.