File size: 2,732 Bytes
6d00985
 
 
 
 
 
 
 
2ac889d
6d00985
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
d25ab43
6d00985
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
---
license: mit
language: ar
datasets:
- mozilla-foundation/common_voice_17_0
metrics:
- wer
library_name: nemo
pipeline_tag: automatic-speech-recognition
tags:
- asr
- automatic speech recognition
---

---

# Model Card for Arabic ASR with NeMo Conformer CTC

## Model Details

**Model Name:** NeMo-Conformer-CTC-Arabic-ASR

**Model Type:** Conformer CTC (Connectionist Temporal Classification) (small)

**Language:** Arabic

**License:** MIT

**Model Creator:** Mostafa Ahmed

**Contact Information:** mostafa.ahmed00976@gmail.com

**Model Version:** 1.0

## Overview

NeMo-Conformer-CTC-Arabic-ASR is a fine-tuned version of the NeMo Conformer CTC model specifically designed for Automatic Speech Recognition (ASR) task in Arabic. The model has been trained to convert spoken Arabic into written text, making it suitable for various applications such as transcription services, voice assistants, and accessibility tools.

## Intended Use

The model is intended for use in:

- Automatic Speech Recognition (ASR) systems for Arabic
- Transcription services for Arabic audio
- Voice assistants and conversational agents
- Accessibility tools for Arabic speakers

## Training Data

The model was fine-tuned on the Arabic Common Voice dataset, an open-source dataset of transcribed speech. The dataset includes a variety of speakers and audio conditions, ensuring the model's robustness in different scenarios.

**Data Sources:**

- [Common Voice](https://commonvoice.mozilla.org/en/datasets): A multilingual dataset for speech recognition tasks.

## Training Procedure

The model was trained using NVIDIA's NeMo framework. The training process involved:

- Preprocessing the Common Voice dataset and convert it to manifests to format the audio and transcriptions for ASR.
- Fine-tuning the pre-trained Conformer CTC model on the Arabic common voice dataset.
- Evaluating the model's performance using standard ASR metrics (Word Error Rate, WER).

## Evaluation Results

The model was evaluated on a held-out test set from the Arabic portion of the Common Voice dataset. Here are the key performance metrics:

- **Word Error Rate (WER):** 30% on Train, 32% on Validation and 40% on Test (No Language Model)

This metric indicates the model's effectiveness in accurately transcribing Arabic speech into text.

## How to Use

You can load and use the model with the NeMo framework as follows:

```python
import nemo.collections.asr as nemo_asr

# Load the model
asr_model = nemo_asr.models.EncDecCTCModel.from_pretrained("MostafaAhmed98/Conformer-CTC-Arabic-ASR")

# Example usage
audio_file = "path/to/arabic_audio.wav"
transcription = asr_model.transcribe([audio_file])

print(transcription[0])  # Output: Transcribed Arabic text
```