Update README.md

40e50ce verified 25 days ago

4.49 kB

	---
	library_name: peft
	language:
	- zh
	license: mit
	base_model: openai/whisper-large-v3-turbo
	tags:
	- wft
	- whisper
	- automatic-speech-recognition
	- audio
	- speech
	- generated_from_trainer
	datasets:
	- JacobLinCool/common_voice_19_0_zh-TW
	metrics:
	- wer
	model-index:
	- name: whisper-large-v3-turbo-common_voice_19_0-zh-TW-lora
	results:
	- task:
	type: automatic-speech-recognition
	name: Automatic Speech Recognition
	dataset:
	name: JacobLinCool/common_voice_19_0_zh-TW
	type: JacobLinCool/common_voice_19_0_zh-TW
	metrics:
	- type: wer
	value: 32.55535607420706
	name: Wer
	---

	<!-- This model card has been generated automatically according to the information the Trainer had access to. You
	should probably proofread and complete it, then remove this comment. -->

	# whisper-large-v3-turbo-common_voice_19_0-zh-TW-lora

	This model is a fine-tuned version of [openai/whisper-large-v3-turbo](https://huggingface.co/openai/whisper-large-v3-turbo) on the JacobLinCool/common_voice_19_0_zh-TW dataset.
	It achieves the following results on the evaluation set:
	- Loss: 0.1786
	- Wer: 32.5554
	- Cer: 8.6009
	- Decode Runtime: 90.9833
	- Wer Runtime: 0.1257
	- Cer Runtime: 0.1534

	## Model description

	This is an open-source Traditional Chinese (Taiwan) automatic speech recognition (ASR) model.

	## Intended uses & limitations

	This model is designed to be a prompt-free ASR model for Traditional Chinese. Due to its inherited language identification (LID) system from Whisper, which supports other Chinese language variants under the same language token (`zh`), we expect that performance may degrade when transcribing Simplified Chinese.

	The model is free to use under the MIT license.

	## Training and evaluation data

	This model was trained on the [Common Voice Corpus 19.0 Chinese (Taiwan) Subset](https://huggingface.co/datasets/JacobLinCool/common_voice_19_0_zh-TW), containing about 50k training examples (44 hours) and 5k test examples (5 hours). This dataset is four times larger than the combination of training and validation set (`train+validation`) of [mozilla-foundation/common_voice_16_1](https://huggingface.co/datasets/mozilla-foundation/common_voice_16_1), which includes about 12k examples.

	## Training procedure

	[Tensorboard](https://huggingface.co/JacobLinCool/whisper-large-v3-turbo-common_voice_19_0-zh-TW-lora/tensorboard)

	### Training hyperparameters

	The following hyperparameters were used during training:
	- learning_rate: 0.0002
	- train_batch_size: 4
	- eval_batch_size: 32
	- seed: 42
	- gradient_accumulation_steps: 8
	- total_train_batch_size: 32
	- optimizer: Use OptimizerNames.ADAMW_TORCH with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
	- lr_scheduler_type: linear
	- lr_scheduler_warmup_steps: 50
	- training_steps: 5000

	### Training results

	\| Training Loss \| Epoch \| Step \| Validation Loss \| Wer \| Cer \| Decode Runtime \| Wer Runtime \| Cer Runtime \|
	\|:-------------:\|:------:\|:----:\|:---------------:\|:-------:\|:-------:\|:--------------:\|:-----------:\|:-----------:\|
	\| No log \| 0 \| 0 \| 2.7208 \| 76.5011 \| 20.4851 \| 89.4916 \| 0.1213 \| 0.1639 \|
	\| 1.1832 \| 0.1 \| 500 \| 0.1939 \| 39.9561 \| 10.8721 \| 90.0926 \| 0.1222 \| 0.1555 \|
	\| 1.5179 \| 0.2 \| 1000 \| 0.1774 \| 37.6621 \| 9.9322 \| 89.8657 \| 0.1225 \| 0.1545 \|
	\| 0.6179 \| 0.3 \| 1500 \| 0.1796 \| 36.2657 \| 9.8325 \| 90.2480 \| 0.1198 \| 0.1573 \|
	\| 0.3626 \| 1.0912 \| 2000 \| 0.1846 \| 36.2258 \| 9.7801 \| 90.3306 \| 0.1196 \| 0.1539 \|
	\| 0.1311 \| 1.1912 \| 2500 \| 0.1776 \| 34.8095 \| 9.3214 \| 90.3124 \| 0.1286 \| 0.1610 \|
	\| 0.1263 \| 1.2912 \| 3000 \| 0.1763 \| 36.1261 \| 9.3563 \| 90.4271 \| 0.1330 \| 0.1650 \|
	\| 0.2194 \| 2.0825 \| 3500 \| 0.1891 \| 34.6898 \| 9.3114 \| 91.1932 \| 0.1320 \| 0.1643 \|
	\| 0.1127 \| 2.1825 \| 4000 \| 0.1838 \| 34.0714 \| 9.1095 \| 90.2416 \| 0.1196 \| 0.1529 \|
	\| 0.3792 \| 2.2824 \| 4500 \| 0.1786 \| 33.1339 \| 8.7679 \| 90.9144 \| 0.1310 \| 0.1550 \|
	\| 0.0606 \| 3.0737 \| 5000 \| 0.1786 \| 32.5554 \| 8.6009 \| 90.9833 \| 0.1257 \| 0.1534 \|


	### Framework versions

	- PEFT 0.13.2
	- Transformers 4.46.1
	- Pytorch 2.4.0
	- Datasets 3.0.2
	- Tokenizers 0.20.1