Navvye
/

whisper-large-v2-kangri

Automatic Speech Recognition

Generated from Trainer

Inference Endpoints

Model card Files Files and versions Community

whisper-large-v2-kangri / README.md

Navvye's picture

Updated Readme.md

8d0653d 9 months ago

|

raw history blame

No virus

3.37 kB

	---
	license: apache-2.0
	tags:
	- generated_from_trainer
	metrics:
	- wer
	model-index:
	- name: whisper-large-v2-kangri
	results:
	- task:
	type: automatic-speech-recognition
	name: Speech Recognition
	dataset:
	type: bridgeconn/snow-mountain
	name: snow-moutain-Kangri
	config: Kangri
	split: train_500
	metrics:
	- type: wer
	value: 17.40
	name: WER
	lower_is_better: true
	---
	<!-- This model card has been generated automatically according to the information the Trainer had access to. You
	should probably proofread and complete it, then remove this comment. -->

	# whisper-large-v2-kangri

	This model is a fine-tuned version of [vasista22/whisper-hindi-large-v2](https://huggingface.co/vasista22/whisper-hindi-large-v2) on the [bridgeconn/snow-mountain](https://huggingface.co/datasets/bridgeconn/snow-mountain) dataset for the low resource Indian language- Kangri.
	It achieves the following results on the evaluation set:
	- Loss: 0.2967
	- Wer: 0.1740

	## Usage

	In order to evaluate this model on an entire dataset, the evaluation codes available in the whisper-finetune repository can be used.

	The same repository also provides the scripts for faster inference using whisper-jax.


	## Training and evaluation data

	Training Data:
	- [Snow Mountain Dataset for Kangri Language](https://huggingface.co/datasets/bridgeconn/snow-mountain)

	Evaluation Data:
	- [Snow Mountain Dataset for Kangri Language](https://huggingface.co/datasets/bridgeconn/snow-mountain)
	- [Kangri Translators Dataset ](https://drive.google.com/drive/folders/16BdOieekGRAo2bFOQDd4YhE2LpgiRnqQ?usp=share_link)

	## Training procedure

	We implemented Cross-Lingual Phoneme Recognition - a process that leverages patterns in resource-rich languages such as Hindi to recognize utterances in resource-poor languages
	such as Kangri. By fine-tuning a pre-trained model of the Whisper-Hindi-Large-V2 on a customised dataset - we have achieved SoTa accuracy.
	A customised dataset - consisting of the brigdeconn/snow-mountain and sentences collected from Kangri translators was created. This was then split using the 80/20
	split rule. The results were evaluated with 5000 steps. The model decreases the word error rate by 0.6% after the initial 1000 steps. The Validation Loss increases due to
	more data being introduced.


	### Training hyperparameters

	The following hyperparameters were used during training:
	- learning_rate: 1e-05
	- train_batch_size: 8
	- eval_batch_size: 8
	- seed: 42
	- distributed_type: multi-GPU
	- num_devices: 2
	- total_train_batch_size: 16
	- total_eval_batch_size: 16
	- optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
	- lr_scheduler_type: linear
	- lr_scheduler_warmup_steps: 500
	- training_steps: 5000
	- mixed_precision_training: Native AMP

	### Training results

	\| Training Loss \| Epoch \| Step \| Validation Loss \| Wer \|
	\|:-------------:\|:-----:\|:----:\|:---------------:\|:------:\|
	\| 0.0001 \| 40.0 \| 1000 \| 0.2442 \| 0.1800 \|
	\| 0.0 \| 80.0 \| 2000 \| 0.2752 \| 0.1764 \|
	\| 0.0 \| 120.0 \| 3000 \| 0.2870 \| 0.1747 \|
	\| 0.0 \| 160.0 \| 4000 \| 0.2940 \| 0.1745 \|
	\| 0.0 \| 200.0 \| 5000 \| 0.2967 \| 0.1740 \|


	### Framework versions

	- Transformers 4.28.0.dev0
	- Pytorch 2.0.0+cu117
	- Datasets 2.11.0
	- Tokenizers 0.13.3