--- license: apache-2.0 tags: - generated_from_trainer metrics: - wer model-index: - name: whisper-large-et-children results: [] language: - et library_name: transformers --- # whisper-large-v2-et-children This model is a fine-tuned version of [agnesluhtaru/whisper-large-et-ERR2020-v2](https://huggingface.co/agnesluhtaru/whisper-large-et-ERR2020-v2) on an Estonian children's speech dataset. More information about the model's performance and the data used for evaluation and training: Luhtaru, Agnes; Jaaska, Rauno; Kruusamäe, Karl; Fishel, Mark (2023). Automatic Transcription for Estonian Children’s Speech. In: Proceedings of the 24th Nordic Conference on Computational Linguistics. [https://openreview.net/forum?id=xbPTfBIUby](https://openreview.net/forum?id=xbPTfBIUby) ### Training hyperparameters The following hyperparameters were used during training: - learning_rate: 1e-05 - train_batch_size: 2 - eval_batch_size: 1 - seed: 42 - gradient_accumulation_steps: 16 - total_train_batch_size: 32 - optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08 - lr_scheduler_type: linear - lr_scheduler_warmup_steps: 200 - training_steps: 2000 - mixed_precision_training: Native AMP ### Training results | Training Loss | Epoch | Step | Validation Loss | Wer | |:-------------:|:-----:|:----:|:---------------:|:-------:| | 0.0302 | 4.03 | 500 | 0.2971 | 16.2892 | | 0.0042 | 8.06 | 1000 | 0.3406 | 15.8551 | | 0.0017 | 12.1 | 1500 | 0.3714 | 15.5585 | | 0.0009 | 16.13 | 2000 | 0.3934 | 15.6445 | ### Framework versions - Transformers 4.26.0.dev0 - Pytorch 1.12.1+rocm5.1.1 - Datasets 2.7.1.dev0 - Tokenizers 0.13.2