# Tunisian Arabic ASR Model with wav2vec2 | |
This repository provides all the necessary tools to perform automatic speech recognition from an end-to-end system pretrained on Tunisian arabic dialect | |
## Performance | |
The following table summarizes the performance of the model on various considered test sets : | |
| Dataset | CER | WER | | |
|-------------- |------- |------- | | |
| TARIC | 6.22 | 10.55 | | |
| IWSLT | 21.18 | 39.53 | | |
| TunSwitch TO | 9.67 | 25.54 | | |
More details about the test sets, and the conditions leading to this performance in the paper. | |
## Datasets | |
This ASR model was trained on : | |
* TARIC : The corpus, named TARIC (Tunisian Arabic Railway Interaction Corpus) has a collection of audio recordings and transcriptions from dialogues in the Tunisian Railway Transport Network. - [Taric Corpus](https://aclanthology.org/L14-1385/) - | |
* IWSLT : A Tunisian conversational speech - [IWSLT Corpus](https://iwslt.org/2022/dialect)- | |
* TunSwitch : Our crowd-collected dataset described in the paper presented below. | |
## Inference | |
## Install | |
```python | |
pip install speechbrain transformers | |
``` | |