speechbrain
/

slu-wav2vec2-ctc-MEDIA-full

+---
+language:
+- fr
+thumbnail: null
+pipeline_tag: spoken-language-understanding
+tags:
+- CTC
+- pytorch
+- speechbrain
+- hf-slu-leaderboard
+license: apache-2.0
+datasets:
+- MEDIA
+metrics:
+- cver
+- cer
+- cher
+model-index:
+- name: slu-wav2vec2-ctc-MEDIA-relax
+  results:
+  - task:
+      name: Spoken Language Understanding
+      type: spoken-language-understanding
+    dataset:
+      name: MEDIA
+      type: MEDIA_slu_relax
+      config: fr
+      split: test
+      args:
+        language: fr
+    metrics:
+    - name: Test ChER
+      type: cher
+      value: 7.46
+    - name: Test CER
+      type: cer
+      value: 20.10
+    - name: Test CVER
+      type: cver
+      value: 31.41
+---
+<iframe src="https://ghbtns.com/github-btn.html?user=speechbrain&repo=speechbrain&type=star&count=true&size=large&v=2" frameborder="0" scrolling="0" width="170" height="30" title="GitHub"></iframe>
+<br/><br/>
+# wav2vec 2.0 with CTC trained on MEDIA
+This repository provides all the necessary tools to perform spoken language understanding
+from an end-to-end system pretrained on MEDIA (French Language) within
+SpeechBrain. For a better experience, we encourage you to learn more about
+[SpeechBrain](https://speechbrain.github.io).
+The performance of the model is the following:
+| Release | Test ChER | Test CER | Test CVER | GPUs |
+|:-------------:|:--------------:|:--------------:|:--------------:|:--------:|
+| 22-02-23 | 7.46 | 20.10 | 31.41 | 1xV100 32GB |
+## Pipeline description
+This SLU system is composed of an acoustic model (wav2vec2.0 + CTC). A pretrained wav2vec 2.0 model ([LeBenchmark/wav2vec2-FR-3K-large](https://huggingface.co/LeBenchmark/wav2vec2-FR-3K-large)) is combined with three DNN layers and finetuned on MEDIA.
+The obtained final acoustic representation is given to the CTC greedy decoder.
+The system is trained with recordings sampled at 16kHz (single channel).
+The code will automatically normalize your audio (i.e., resampling + mono channel selection) when calling *transcribe_file* if needed.
+## Install SpeechBrain
+First of all, please install tranformers and SpeechBrain with the following command:
+```
+pip install speechbrain transformers
+```
+Please notice that we encourage you to read our tutorials and learn more about
+[SpeechBrain](https://speechbrain.github.io).
+### Transcribing and semantically annotating your own audio files (in French)
+```python
+from speechbrain.pretrained import EncoderASR
+asr_model = EncoderASR.from_hparams(source="speechbrain/slu-wav2vec2-ctc-MEDIA-relax", savedir="pretrained_models/slu-wav2vec2-ctc-MEDIA-relax")
+asr_model.transcribe_file('speechbrain/asr-wav2vec2-commonvoice-fr/example-fr.wav')
+```
+### Inference on GPU
+To perform inference on the GPU, add  `run_opts={"device":"cuda"}`  when calling the `from_hparams` method.
+### Training
+The model was trained with SpeechBrain.
+To train it from scratch follow these steps:
+1. Clone SpeechBrain:
+```bash
+git clone https://github.com/speechbrain/speechbrain/
+```
+2. Install it:
+```bash
+cd speechbrain
+pip install -r requirements.txt
+pip install -e .
+```
+3. Download MEDIA related files:
+- [Media ASR (ELRA-S0272)](https://catalogue.elra.info/en-us/repository/browse/ELRA-S0272/)
+- [Media SLU (ELRA-E0024)](https://catalogue.elra.info/en-us/repository/browse/ELRA-E0024/)
+- [channels.csv and concepts_full_relax.csv](https://drive.google.com/drive/u/1/folders/1z2zFZp3c0NYLFaUhhghhBakGcFdXVRyf)
+4. Modify placeholders in hparams/train_hf_wav2vec_relax.yaml:
+```bash
+data_folder = !PLACEHOLDER
+channels_path = !PLACEHOLDER
+concepts_path = !PLACEHOLDER
+```
+5. Run Training:
+```bash
+cd recipes/MEDIA/SLU/CTC/
+python train_hf_wav2vec.py hparams/train_hf_wav2vec_relax.yaml
+```
+You can find our training results (models, logs, etc) [here](https://drive.google.com/drive/folders/1ALtwmk3VUUM0XRToecQp1DKAh9FsGqMA?usp=sharing).
+### Limitations
+The SpeechBrain team does not provide any warranty on the performance achieved by this model when used on other datasets.
+#### Referencing SpeechBrain
+```
+@misc{SB2021,
+    author = {Ravanelli, Mirco and Parcollet, Titouan and Rouhe, Aku and Plantinga, Peter and Rastorgueva, Elena and Lugosch, Loren and Dawalatabad, Nauman and Ju-Chieh, Chou and Heba, Abdel and Grondin, Francois and Aris, William and Liao, Chien-Feng and Cornell, Samuele and Yeh, Sung-Lin and Na, Hwidong and Gao, Yan and Fu, Szu-Wei and Subakan, Cem and De Mori, Renato and Bengio, Yoshua },
+    title = {SpeechBrain},
+    year = {2021},
+    publisher = {GitHub},
+    journal = {GitHub repository},
+    howpublished = {\\\\url{https://github.com/speechbrain/speechbrain}},
+  }
+```
+#### About SpeechBrain
+SpeechBrain is an open-source and all-in-one speech toolkit. It is designed to be simple, extremely flexible, and user-friendly. Competitive or state-of-the-art performance is obtained in various domains.
+Website: https://speechbrain.github.io/
+GitHub: https://github.com/speechbrain/speechbrain