Edit model card

You need to agree to share your contact information to access this model

This repository is publicly accessible, but you have to accept the conditions to access its files and content.

Log in or Sign Up to review the conditions and access this model content.

Automatic Speech Recognition - CKB

This model is trained on the PawanKrd/asr-ckb dataset. This model is specifically for the Central Kurdish (Sorani) language.

Model Performance

The model achieves the following performance on the evaluation set:

  • Loss: 0.0048
  • Word Error Rate (WER): 4.1304

Model Description

This Automatic Speech Recognition (ASR) model for Central Kurdish (Sorani) is designed to transcribe spoken Kurdish into written text. It leverages a deep learning architecture optimized for speech-to-text tasks. The model is built using the Transformers library and trained on a diverse set of Central Kurdish audio recordings.

Intended Uses & Limitations

This model is intended for automatic transcription of Central Kurdish audio. It performs best on clear, high-quality audio recordings. Performance may degrade with noisy backgrounds, strong accents, or atypical pronunciations.

Intended Uses

  • Transcribing interviews and speeches in Central Kurdish.
  • Creating subtitles for Kurdish videos.
  • Assisting in the documentation and preservation of the Kurdish language.

Limitations

  • Performance may be suboptimal on audio with heavy background noise.
  • Strong regional accents or non-standard pronunciations can impact accuracy.
  • Not suitable for real-time transcription without further optimization.

Training and Evaluation Data

The model was trained and evaluated using the PawanKrd/asr-ckb dataset, which consists of diverse audio samples in Central Kurdish. The training process was designed to optimize the model's recognition accuracy for this specific language.

Training Procedure

Hyperparameters

  • Learning Rate: 1e-05
  • Train Batch Size: 32
  • Eval Batch Size: 16
  • Seed: 42
  • Optimizer: Adam (betas=(0.9, 0.999), epsilon=1e-08)
  • Learning Rate Scheduler: Linear
  • Warmup Steps: 500
  • Epochs: 3

Training Results

Training Loss Epoch Step Validation Loss WER
0.0966 0.1927 1000 0.1457 29.30
0.0952 0.3854 2000 0.0988 22.26
0.0582 0.5780 3000 0.0741 17.51
0.0523 0.7707 4000 0.0532 15.14
0.0164 0.9634 5000 0.0412 14.19
0.0271 1.1561 6000 0.0519 15.68
0.0358 1.3487 7000 0.0407 11.18
0.0208 1.5414 8000 0.0327 9.94
0.031 1.7341 9000 0.0268 10.86
0.033 1.9268 10000 0.0191 7.70
0.0269 2.1195 11000 0.0138 6.48
0.025 2.3121 12000 0.0111 6.83
0.003 2.5048 13000 0.0086 5.78
0.0021 2.6975 14000 0.0065 4.66
0.0031 2.8902 15000 0.0048 4.13

Framework Versions

  • Transformers: 4.41.0.dev0
  • PyTorch: 2.3.0+cu121
  • Datasets: 2.19.1
  • Tokenizers: 0.19.1

Example Usage

To use this model for transcription, you can follow the example code below:

from transformers import pipeline

# Load the fine-tuned model
asr_pipeline = pipeline(model="PawanKrd/asr-large-ckb")

# Transcribe audio file
audio_file = "audio.wav"
transcription = asr_pipeline(audio_file)

# Print the transcription
print(transcription["text"])

This code demonstrates how to load the model and use it to transcribe an audio file in Central Kurdish.

Downloads last month
109
Safetensors
Model size
1.54B params
Tensor type
F32
·

Dataset used to train PawanKrd/asr-large-ckb

Evaluation results