Tunisian Arabic ASR Model with wav2vec2

This repository provides all the necessary tools to perform automatic speech recognition from an end-to-end system pretrained on Tunisian arabic dialect

Performance

The following table summarizes the performance of the model on various considered test sets :

| Dataset | CER | WER | |-------------- |------- |------- | | TARIC | 6.22 | 10.55 | | IWSLT | 21.18 | 39.53 | | TunSwitch TO | 9.67 | 25.54 |

More details about the test sets, and the conditions leading to this performance in the paper.

Datasets

This ASR model was trained on :

TARIC : The corpus, named TARIC (Tunisian Arabic Railway Interaction Corpus) has a collection of audio recordings and transcriptions from dialogues in the Tunisian Railway Transport Network. - Taric Corpus -
IWSLT : A Tunisian conversational speech - IWSLT Corpus-
TunSwitch : Our crowd-collected dataset described in the paper presented below.

Inference

Install

pip install speechbrain transformers