Automatic Speech Recognition - CKB
This model is trained on the PawanKrd/asr-ckb dataset. This model is specifically for the Central Kurdish (Sorani) language.
Model Performance
The model achieves the following performance on the evaluation set:
- Loss: 0.0048
- Word Error Rate (WER): 4.1304
Model Description
This Automatic Speech Recognition (ASR) model for Central Kurdish (Sorani) is designed to transcribe spoken Kurdish into written text. It leverages a deep learning architecture optimized for speech-to-text tasks. The model is built using the Transformers library and trained on a diverse set of Central Kurdish audio recordings.
Intended Uses & Limitations
This model is intended for automatic transcription of Central Kurdish audio. It performs best on clear, high-quality audio recordings. Performance may degrade with noisy backgrounds, strong accents, or atypical pronunciations.
Intended Uses
- Transcribing interviews and speeches in Central Kurdish.
- Creating subtitles for Kurdish videos.
- Assisting in the documentation and preservation of the Kurdish language.
Limitations
- Performance may be suboptimal on audio with heavy background noise.
- Strong regional accents or non-standard pronunciations can impact accuracy.
- Not suitable for real-time transcription without further optimization.
Training and Evaluation Data
The model was trained and evaluated using the PawanKrd/asr-ckb dataset, which consists of diverse audio samples in Central Kurdish. The training process was designed to optimize the model's recognition accuracy for this specific language.
Training Procedure
Hyperparameters
- Learning Rate: 1e-05
- Train Batch Size: 32
- Eval Batch Size: 16
- Seed: 42
- Optimizer: Adam (betas=(0.9, 0.999), epsilon=1e-08)
- Learning Rate Scheduler: Linear
- Warmup Steps: 500
- Epochs: 3
Training Results
Training Loss | Epoch | Step | Validation Loss | WER |
---|---|---|---|---|
0.0966 | 0.1927 | 1000 | 0.1457 | 29.30 |
0.0952 | 0.3854 | 2000 | 0.0988 | 22.26 |
0.0582 | 0.5780 | 3000 | 0.0741 | 17.51 |
0.0523 | 0.7707 | 4000 | 0.0532 | 15.14 |
0.0164 | 0.9634 | 5000 | 0.0412 | 14.19 |
0.0271 | 1.1561 | 6000 | 0.0519 | 15.68 |
0.0358 | 1.3487 | 7000 | 0.0407 | 11.18 |
0.0208 | 1.5414 | 8000 | 0.0327 | 9.94 |
0.031 | 1.7341 | 9000 | 0.0268 | 10.86 |
0.033 | 1.9268 | 10000 | 0.0191 | 7.70 |
0.0269 | 2.1195 | 11000 | 0.0138 | 6.48 |
0.025 | 2.3121 | 12000 | 0.0111 | 6.83 |
0.003 | 2.5048 | 13000 | 0.0086 | 5.78 |
0.0021 | 2.6975 | 14000 | 0.0065 | 4.66 |
0.0031 | 2.8902 | 15000 | 0.0048 | 4.13 |
Framework Versions
- Transformers: 4.41.0.dev0
- PyTorch: 2.3.0+cu121
- Datasets: 2.19.1
- Tokenizers: 0.19.1
Example Usage
To use this model for transcription, you can follow the example code below:
from transformers import pipeline
# Load the fine-tuned model
asr_pipeline = pipeline(model="PawanKrd/asr-large-ckb")
# Transcribe audio file
audio_file = "audio.wav"
transcription = asr_pipeline(audio_file)
# Print the transcription
print(transcription["text"])
This code demonstrates how to load the model and use it to transcribe an audio file in Central Kurdish.
- Downloads last month
- 0