Automatic Speech Recognition - CKB

This model is trained on the PawanKrd/asr-ckb dataset. This model is specifically for the Central Kurdish (Sorani) language.

Model Performance

The model achieves the following performance on the evaluation set:

Loss: 0.0048
Word Error Rate (WER): 4.1304

Model Description

This Automatic Speech Recognition (ASR) model for Central Kurdish (Sorani) is designed to transcribe spoken Kurdish into written text. It leverages a deep learning architecture optimized for speech-to-text tasks. The model is built using the Transformers library and trained on a diverse set of Central Kurdish audio recordings.

Intended Uses & Limitations

This model is intended for automatic transcription of Central Kurdish audio. It performs best on clear, high-quality audio recordings. Performance may degrade with noisy backgrounds, strong accents, or atypical pronunciations.

Intended Uses

Transcribing interviews and speeches in Central Kurdish.
Creating subtitles for Kurdish videos.
Assisting in the documentation and preservation of the Kurdish language.

Limitations

Performance may be suboptimal on audio with heavy background noise.
Strong regional accents or non-standard pronunciations can impact accuracy.
Not suitable for real-time transcription without further optimization.

Training and Evaluation Data

The model was trained and evaluated using the PawanKrd/asr-ckb dataset, which consists of diverse audio samples in Central Kurdish. The training process was designed to optimize the model's recognition accuracy for this specific language.

Training Procedure

Hyperparameters

Learning Rate: 1e-05
Train Batch Size: 32
Eval Batch Size: 16
Seed: 42
Optimizer: Adam (betas=(0.9, 0.999), epsilon=1e-08)
Learning Rate Scheduler: Linear
Warmup Steps: 500
Epochs: 3

Training Results

Training Loss	Epoch	Step	Validation Loss	WER
0.0966	0.1927	1000	0.1457	29.30
0.0952	0.3854	2000	0.0988	22.26
0.0582	0.5780	3000	0.0741	17.51
0.0523	0.7707	4000	0.0532	15.14
0.0164	0.9634	5000	0.0412	14.19
0.0271	1.1561	6000	0.0519	15.68
0.0358	1.3487	7000	0.0407	11.18
0.0208	1.5414	8000	0.0327	9.94
0.031	1.7341	9000	0.0268	10.86
0.033	1.9268	10000	0.0191	7.70
0.0269	2.1195	11000	0.0138	6.48
0.025	2.3121	12000	0.0111	6.83
0.003	2.5048	13000	0.0086	5.78
0.0021	2.6975	14000	0.0065	4.66
0.0031	2.8902	15000	0.0048	4.13

Framework Versions

Transformers: 4.41.0.dev0
PyTorch: 2.3.0+cu121
Datasets: 2.19.1
Tokenizers: 0.19.1

Example Usage

To use this model for transcription, you can follow the example code below:

from transformers import pipeline

# Load the fine-tuned model
asr_pipeline = pipeline(model="PawanKrd/asr-large-ckb")

# Transcribe audio file
audio_file = "audio.wav"
transcription = asr_pipeline(audio_file)

# Print the transcription
print(transcription["text"])

This code demonstrates how to load the model and use it to transcribe an audio file in Central Kurdish.

PawanKrd
/

asr-large-ckb

You need to agree to share your contact information to access this model