README.md · Chillarmo/whisper-small-hy-AM at main

metadata

license: apache-2.0
base_model: openai/whisper-small
metrics:
  - wer
model-index:
  - name: whisper-small-hy-AM
    results: []
datasets:
  - mozilla-foundation/common_voice_16_1
language:
  - hy
library_name: transformers
tags:
  - SpeechToText
  - Audio
  - Audio Transcription
pipeline_tag: automatic-speech-recognition

Model description

Chillarmo/whisper-small-hy-AM is an AI model designed for speech-to-text conversion specifically tailored for the Armenian language. Leveraging the power of fine-tuning, this model, named whisper-small-hy-AM, is based on openai/whisper-small and trained on the common_voice_16_1 dataset. It achieves the following results on the evaluation set:

Loss: 0.2853
Wer: 38.1160

Training Data and Future Enhancements

The training data consists of Mozilla Common Voice version 16.1. Plans for future improvements include continuing the training process and integrating an additional 10 hours of data from datasets such as google/fleurs and possibly google/xtreme_s. Despite its current performance, efforts are underway to further reduce the WER.

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

learning_rate: 1e-05
train_batch_size: 16
eval_batch_size: 8
seed: 42
optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
lr_scheduler_type: linear
lr_scheduler_warmup_steps: 500
training_steps: 4000
mixed_precision_training: Native AMP

Training results

Training Loss	Epoch	Step	Validation Loss	Wer
0.0989	2.48	1000	0.1948	41.5758
0.03	4.95	2000	0.2165	39.1251
0.0016	7.43	3000	0.2659	38.4089
0.0005	9.9	4000	0.2853	38.1160

Framework versions

Transformers 4.37.2
Pytorch 2.1.0+cu121
Datasets 2.16.1
Tokenizers 0.15.1