--- license: apache-2.0 language: - ar tags: - audio - automatic-speech-recognition --- ![license](https://img.shields.io/badge/license-apache2-lightgrey) |![Language](https://img.shields.io/badge/Language-Tunisian-lightgrey) |[![Model architecture](https://img.shields.io/badge/Model_Arch-TDNN-lightgrey)](https://github.com/linagora-labs/ASR_train_kaldi_tunisian?tab=readme-ov-file#acoustic-model-am) |[![GitHub](https://img.shields.io/badge/GitHub-ASRTrainKaldiTunisian-lightgrey)](https://github.com/linagora-labs/ASR_train_kaldi_tunisian) # LinTO ASR Arabic Tunisia v0.1 **LinTO ASR Arabic Tunisia v0.1** is an Automatic Speech Recognition (ASR) model for the Tunisian dialect, with some capabilities of code-switching when some French or English words are used. This repository includes two versions of the model and a Language model with ARPA format: - `vosk-model`: The original, comprehensive model. - `android-model`: A lighter version with a simplified graph, optimized for deployment on Android devices or Raspberry Pi applications. - `lm_TN_CS.arpa.gz`: A language model trained using SRILM on a dataset containing 4.5 million lines of text collected from various sources. ## Model Overview - **Model type**: Kaldi TDNN - **Language(s)**: Tunisian Dialect - **Use cases**: Automatic Speech Recognition (ASR) ### Model Performance The following table summarizes the performance of the **LinTO ASR Arabic Tunisia v0.1** model on various considered **test sets**: | Dataset | CER | WER | | :------- | :------- | :------- | | [Youtube_TNScrapped_V1](https://huggingface.co/datasets/linagora/linto-dataset-audio-ar-tn#data-table) | `25.39%` | `37.51%` | | [TunSwitchCS](https://huggingface.co/datasets/linagora/linto-dataset-audio-ar-tn#data-table) | `17.72%` | `20.51%` | | [TunSwitchTO](https://huggingface.co/datasets/linagora/linto-dataset-audio-ar-tn#data-table) | `11.13%` | `22.54%` | | [ApprendreLeTunisien](https://huggingface.co/datasets/linagora/linto-dataset-audio-ar-tn#data-table) | `11.81%` | `23.27%` | | [TARIC](https://github.com/elyadata/TARIC-SLU) | `10.60%` | `16.06%` | | [OneStory](https://huggingface.co/datasets/linagora/linto-dataset-audio-ar-tn#data-table)| `1.53%` | `4.47%` | ### Training code The model was trained using the following GitHub repository: [ASR_train_kaldi_tunisian](https://github.com/linagora-labs/ASR_train_kaldi_tunisian) ### Training datasets The model was trained using the following datasets: - **[LinTO DataSet Audio for Arabic Tunisian](https://huggingface.co/datasets/linagora/linto-dataset-audio-ar-tn):** This dataset comprises a collection of Tunisian dialect audio recordings and their annotations for Speech-to-Text (STT) tasks. The data was collected from various sources, including Hugging Face, YouTube, and websites. - **[LinTO DataSet Audio for Arabic Tunisian Augmented](https://huggingface.co/datasets/linagora/linto-dataset-audio-ar-tn-augmented):** This dataset is an augmented version of the LinTO DataSet Audio for Arabic Tunisian v0.1. The augmentation includes noise reduction and voice conversion. - **[TARIC](https://github.com/elyadata/TARIC-SLU):** This dataset consists of Tunisian Arabic speech recordings collected from train stations in Tunisia. ## How to use ### 1. Download the model You can download the model and its components directly from this repository using one of the following methods: **Method 1: Direct Download via Browser** 1. **Visit the Repository**: Navigate to the [Hugging Face model page](https://huggingface.co/linagora/linto-asr-ar-tn-0.1). 2. **Download as Zip**: Click on the "Download" button or the "Code" button (often appearing as a dropdown). Select "Download ZIP" to get the entire repository as a zip file. **Method 2: Using `curl` command** You can follow the command below: ```bash sudo apt-get install curl curl -L https://huggingface.co/linagora/linto-asr-ar-tn-0.1/resolve/main/vosk-model.zip --output vosk-model.zip ``` (or same with `android-model.zip` instead of `vosk-model.zip`) **Method 3: Cloning the Repository** You can clone the repository and create a zip file of the contents if needed: ```bash sudo apt-get install git-lfs git lfs install git clone https://huggingface.co/linagora/linto-asr-ar-tn-0.1.git cd linto-asr-ar-tn-0.1 ``` ### 2. Unzip the model This can be done in bash: ```bash mkdir dir_for_zip_extract unzip /path/to/model-name.zip -d dir_for_zip_extract ``` ### 3. Python code First, make sure to install the required dependencies: ```bash pip install vosk ``` Then you can launch the inference script from this repository: ```bash python inference.py ``` or use such a python code: ```python from vosk import Model, KaldiRecognizer import wave import json model_dir = "path/to/your/model" audio_file = "path/to/your/audio/file.wav" model = Model(model_dir) with wave.open(audio_file, "rb") as wf: if wf.getnchannels() != 1 or wf.getsampwidth() != 2 or wf.getcomptype() != "NONE": raise ValueError("Audio file must be WAV format mono PCM.") rec = KaldiRecognizer(model, wf.getframerate()) rec.AcceptWaveform(wf.readframes(wf.getnframes())) res = rec.FinalResult() transcript = json.loads(res)["text"] print(f"Transcript: {transcript}") ``` ## Example Here is an example of the transcription capabilities of the model: ### Result:

بالدعم هاذايا لي بثتهولو ال berd يعني أحنا حتى ال projet متاعو تقلب حتى sur le plan حتى فال management يا سيد نحنا في تسيير الشريكة يعني تبدل مية و ثمانين درجة ماللي يعني قبل ما تجيه ال berd و بعد ما جاتو ال berd برنامج نخصص لل les startup إسمو

## WebRTC Demonstartion Install required dependencies: ```bash pip install vosk pip install websockets ``` If not done, close the repostorory: ```bash git clone https://huggingface.co/linagora/linto-asr-ar-tn-0.1.git ``` Then call the `app.py` script: ```bash cd linto-asr-ar-tn-0.1/Demo-WebRTC python3 app.py ``` Access the web interface at: `localhost:8010` Just start and speak. Preview of the web app interface: ![Demo Interface](https://huggingface.co/linagora/linto-asr-ar-tn-0.1/resolve/main/example.png) ## Citation ```bibtex @misc{linagora2024Linto-tn, author = {Hedi Naouara and Jérôme Louradour and Jean-Pierre Lorré}, title = {LinTO Audio and Textual Datasets to Train and Evaluate Automatic Speech Recognition in Tunisian Arabic Dialect}, year = {2024}, month = {October}, note = {Good Data Workshop, AAAI 2025}, howpublished = {\url{https://huggingface.co/linagora/linto-asr-ar-tn-0.1}}, } ```