asr-large-ckb / README.md
PawanOsman's picture
Update README.md
2108149 verified
---
language:
- ckb
tags:
- generated_from_trainer
datasets:
- PawanKrd/asr-ckb
metrics:
- wer
model-index:
- name: ASR CKB
results:
- task:
name: Automatic Speech Recognition
type: automatic-speech-recognition
dataset:
name: PawanKrd/asr-ckb
type: PawanKrd/asr-ckb
metrics:
- name: Wer
type: wer
value: 4.1303699778079555
---
# Automatic Speech Recognition - CKB
This model is trained on the [PawanKrd/asr-ckb](https://huggingface.co/datasets/PawanKrd/asr-ckb) dataset. This model is specifically for the Central Kurdish (Sorani) language.
## Model Performance
The model achieves the following performance on the evaluation set:
- **Loss**: 0.0048
- **Word Error Rate (WER)**: 4.1304
## Model Description
This Automatic Speech Recognition (ASR) model for Central Kurdish (Sorani) is designed to transcribe spoken Kurdish into written text. It leverages a deep learning architecture optimized for speech-to-text tasks. The model is built using the Transformers library and trained on a diverse set of Central Kurdish audio recordings.
## Intended Uses & Limitations
This model is intended for automatic transcription of Central Kurdish audio. It performs best on clear, high-quality audio recordings. Performance may degrade with noisy backgrounds, strong accents, or atypical pronunciations.
### Intended Uses
- Transcribing interviews and speeches in Central Kurdish.
- Creating subtitles for Kurdish videos.
- Assisting in the documentation and preservation of the Kurdish language.
### Limitations
- Performance may be suboptimal on audio with heavy background noise.
- Strong regional accents or non-standard pronunciations can impact accuracy.
- Not suitable for real-time transcription without further optimization.
## Training and Evaluation Data
The model was trained and evaluated using the [PawanKrd/asr-ckb](https://huggingface.co/datasets/PawanKrd/asr-ckb) dataset, which consists of diverse audio samples in Central Kurdish. The training process was designed to optimize the model's recognition accuracy for this specific language.
## Training Procedure
### Hyperparameters
- **Learning Rate**: 1e-05
- **Train Batch Size**: 32
- **Eval Batch Size**: 16
- **Seed**: 42
- **Optimizer**: Adam (betas=(0.9, 0.999), epsilon=1e-08)
- **Learning Rate Scheduler**: Linear
- **Warmup Steps**: 500
- **Epochs**: 3
### Training Results
| Training Loss | Epoch | Step | Validation Loss | WER |
|:-------------:|:------:|:-----:|:---------------:|:-------:|
| 0.0966 | 0.1927 | 1000 | 0.1457 | 29.30 |
| 0.0952 | 0.3854 | 2000 | 0.0988 | 22.26 |
| 0.0582 | 0.5780 | 3000 | 0.0741 | 17.51 |
| 0.0523 | 0.7707 | 4000 | 0.0532 | 15.14 |
| 0.0164 | 0.9634 | 5000 | 0.0412 | 14.19 |
| 0.0271 | 1.1561 | 6000 | 0.0519 | 15.68 |
| 0.0358 | 1.3487 | 7000 | 0.0407 | 11.18 |
| 0.0208 | 1.5414 | 8000 | 0.0327 | 9.94 |
| 0.031 | 1.7341 | 9000 | 0.0268 | 10.86 |
| 0.033 | 1.9268 | 10000 | 0.0191 | 7.70 |
| 0.0269 | 2.1195 | 11000 | 0.0138 | 6.48 |
| 0.025 | 2.3121 | 12000 | 0.0111 | 6.83 |
| 0.003 | 2.5048 | 13000 | 0.0086 | 5.78 |
| 0.0021 | 2.6975 | 14000 | 0.0065 | 4.66 |
| 0.0031 | 2.8902 | 15000 | 0.0048 | 4.13 |
### Framework Versions
- **Transformers**: 4.41.0.dev0
- **PyTorch**: 2.3.0+cu121
- **Datasets**: 2.19.1
- **Tokenizers**: 0.19.1
## Example Usage
To use this model for transcription, you can follow the example code below:
```python
from transformers import pipeline
# Load the fine-tuned model
asr_pipeline = pipeline(model="PawanKrd/asr-large-ckb")
# Transcribe audio file
audio_file = "audio.wav"
transcription = asr_pipeline(audio_file)
# Print the transcription
print(transcription["text"])
```
This code demonstrates how to load the model and use it to transcribe an audio file in Central Kurdish.