GaelleLaperriere commited on
Commit
e1a60d9
1 Parent(s): 8a9d929

Create README.md

Browse files
Files changed (1) hide show
  1. README.md +142 -0
README.md ADDED
@@ -0,0 +1,142 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ language:
3
+ - fr
4
+ thumbnail: null
5
+ pipeline_tag: spoken-language-understanding
6
+ tags:
7
+ - CTC
8
+ - pytorch
9
+ - speechbrain
10
+ - hf-slu-leaderboard
11
+ license: apache-2.0
12
+ datasets:
13
+ - MEDIA
14
+ metrics:
15
+ - cver
16
+ - cer
17
+ - cher
18
+ model-index:
19
+ - name: slu-wav2vec2-ctc-MEDIA-relax
20
+ results:
21
+ - task:
22
+ name: Spoken Language Understanding
23
+ type: spoken-language-understanding
24
+ dataset:
25
+ name: MEDIA
26
+ type: MEDIA_slu_relax
27
+ config: fr
28
+ split: test
29
+ args:
30
+ language: fr
31
+ metrics:
32
+ - name: Test ChER
33
+ type: cher
34
+ value: 7.46
35
+ - name: Test CER
36
+ type: cer
37
+ value: 20.10
38
+ - name: Test CVER
39
+ type: cver
40
+ value: 31.41
41
+ ---
42
+
43
+ <iframe src="https://ghbtns.com/github-btn.html?user=speechbrain&repo=speechbrain&type=star&count=true&size=large&v=2" frameborder="0" scrolling="0" width="170" height="30" title="GitHub"></iframe>
44
+ <br/><br/>
45
+
46
+ # wav2vec 2.0 with CTC trained on MEDIA
47
+
48
+ This repository provides all the necessary tools to perform spoken language understanding
49
+ from an end-to-end system pretrained on MEDIA (French Language) within
50
+ SpeechBrain. For a better experience, we encourage you to learn more about
51
+ [SpeechBrain](https://speechbrain.github.io).
52
+
53
+ The performance of the model is the following:
54
+
55
+ | Release | Test ChER | Test CER | Test CVER | GPUs |
56
+ |:-------------:|:--------------:|:--------------:|:--------------:|:--------:|
57
+ | 22-02-23 | 7.46 | 20.10 | 31.41 | 1xV100 32GB |
58
+
59
+ ## Pipeline description
60
+
61
+ This SLU system is composed of an acoustic model (wav2vec2.0 + CTC). A pretrained wav2vec 2.0 model ([LeBenchmark/wav2vec2-FR-3K-large](https://huggingface.co/LeBenchmark/wav2vec2-FR-3K-large)) is combined with three DNN layers and finetuned on MEDIA.
62
+ The obtained final acoustic representation is given to the CTC greedy decoder.
63
+
64
+ The system is trained with recordings sampled at 16kHz (single channel).
65
+ The code will automatically normalize your audio (i.e., resampling + mono channel selection) when calling *transcribe_file* if needed.
66
+
67
+ ## Install SpeechBrain
68
+
69
+ First of all, please install tranformers and SpeechBrain with the following command:
70
+
71
+ ```
72
+ pip install speechbrain transformers
73
+ ```
74
+
75
+ Please notice that we encourage you to read our tutorials and learn more about
76
+ [SpeechBrain](https://speechbrain.github.io).
77
+
78
+ ### Transcribing and semantically annotating your own audio files (in French)
79
+
80
+ ```python
81
+ from speechbrain.pretrained import EncoderASR
82
+
83
+ asr_model = EncoderASR.from_hparams(source="speechbrain/slu-wav2vec2-ctc-MEDIA-relax", savedir="pretrained_models/slu-wav2vec2-ctc-MEDIA-relax")
84
+ asr_model.transcribe_file('speechbrain/asr-wav2vec2-commonvoice-fr/example-fr.wav')
85
+
86
+ ```
87
+ ### Inference on GPU
88
+ To perform inference on the GPU, add `run_opts={"device":"cuda"}` when calling the `from_hparams` method.
89
+
90
+ ### Training
91
+ The model was trained with SpeechBrain.
92
+ To train it from scratch follow these steps:
93
+ 1. Clone SpeechBrain:
94
+ ```bash
95
+ git clone https://github.com/speechbrain/speechbrain/
96
+ ```
97
+ 2. Install it:
98
+ ```bash
99
+ cd speechbrain
100
+ pip install -r requirements.txt
101
+ pip install -e .
102
+ ```
103
+ 3. Download MEDIA related files:
104
+ - [Media ASR (ELRA-S0272)](https://catalogue.elra.info/en-us/repository/browse/ELRA-S0272/)
105
+ - [Media SLU (ELRA-E0024)](https://catalogue.elra.info/en-us/repository/browse/ELRA-E0024/)
106
+ - [channels.csv and concepts_full_relax.csv](https://drive.google.com/drive/u/1/folders/1z2zFZp3c0NYLFaUhhghhBakGcFdXVRyf)
107
+ 4. Modify placeholders in hparams/train_hf_wav2vec_relax.yaml:
108
+ ```bash
109
+ data_folder = !PLACEHOLDER
110
+ channels_path = !PLACEHOLDER
111
+ concepts_path = !PLACEHOLDER
112
+ ```
113
+ 5. Run Training:
114
+ ```bash
115
+ cd recipes/MEDIA/SLU/CTC/
116
+ python train_hf_wav2vec.py hparams/train_hf_wav2vec_relax.yaml
117
+ ```
118
+
119
+ You can find our training results (models, logs, etc) [here](https://drive.google.com/drive/folders/1ALtwmk3VUUM0XRToecQp1DKAh9FsGqMA?usp=sharing).
120
+
121
+ ### Limitations
122
+ The SpeechBrain team does not provide any warranty on the performance achieved by this model when used on other datasets.
123
+
124
+ #### Referencing SpeechBrain
125
+
126
+ ```
127
+ @misc{SB2021,
128
+ author = {Ravanelli, Mirco and Parcollet, Titouan and Rouhe, Aku and Plantinga, Peter and Rastorgueva, Elena and Lugosch, Loren and Dawalatabad, Nauman and Ju-Chieh, Chou and Heba, Abdel and Grondin, Francois and Aris, William and Liao, Chien-Feng and Cornell, Samuele and Yeh, Sung-Lin and Na, Hwidong and Gao, Yan and Fu, Szu-Wei and Subakan, Cem and De Mori, Renato and Bengio, Yoshua },
129
+ title = {SpeechBrain},
130
+ year = {2021},
131
+ publisher = {GitHub},
132
+ journal = {GitHub repository},
133
+ howpublished = {\\\\url{https://github.com/speechbrain/speechbrain}},
134
+ }
135
+ ```
136
+
137
+ #### About SpeechBrain
138
+ SpeechBrain is an open-source and all-in-one speech toolkit. It is designed to be simple, extremely flexible, and user-friendly. Competitive or state-of-the-art performance is obtained in various domains.
139
+
140
+ Website: https://speechbrain.github.io/
141
+
142
+ GitHub: https://github.com/speechbrain/speechbrain