san2003m

Update README.md

aaabde9 verified about 1 month ago

3.77 kB

	---
	language:
	- en
	license: apache-2.0
	base_model: openai/whisper-small
	tags:
	- generated_from_trainer
	datasets:
	- Shiry/ATC_combined
	metrics:
	- wer
	model-index:
	- name: Whisper Small ATC - ATCText
	results:
	- task:
	name: Automatic Speech Recognition
	type: automatic-speech-recognition
	dataset:
	name: ATC
	type: Shiry/ATC_combined
	args: 'split: test'
	metrics:
	- name: Wer
	type: wer
	value: 10.612930650580948
	---

	<!-- This model card has been generated automatically according to the information the Trainer had access to. You
	should probably proofread and complete it, then remove this comment. -->

	# Whisper Small ATC - ATCText

	This model is a fine-tuned version of [openai/whisper-small](https://huggingface.co/openai/whisper-small) on the ATC dataset.
	It achieves the following results on the evaluation set:
	- Loss: 0.2486
	- Wer: 10.6129

	## Model description

	More information needed

	## Intended uses & limitations

	More information needed

	## Training and evaluation data

	More information needed

	## Training procedure

	### Training hyperparameters

	The following hyperparameters were used during training:
	- learning_rate: 1e-05
	- train_batch_size: 8
	- eval_batch_size: 8
	- seed: 42
	- optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
	- lr_scheduler_type: linear
	- lr_scheduler_warmup_steps: 500
	- training_steps: 4000

	### Training results

	\| Training Loss \| Epoch \| Step \| Validation Loss \| Wer \|
	\|:-------------:\|:-----:\|:----:\|:---------------:\|:-------:\|
	\| 0.2533 \| 0.42 \| 1000 \| 0.3465 \| 16.2868 \|
	\| 0.235 \| 0.84 \| 2000 \| 0.2881 \| 13.5237 \|
	\| 0.0851 \| 1.27 \| 3000 \| 0.2607 \| 10.6048 \|
	\| 0.1317 \| 1.69 \| 4000 \| 0.2486 \| 10.6129 \|


	### Framework versions

	- Transformers 4.39.3
	- Pytorch 2.2.2
	- Datasets 2.18.0
	- Tokenizers 0.15.2


	### Additional Information
	## Licensing Information
	The licensing status of the dataset hinges on the legal status of the UWB-ATCC corpus creators.

	They used Creative Commons - Attribution-NonCommercial-ShareAlike 4.0 International (CC BY-NC-SA 4.0) licensing.

	## Citation Information
	Contributors who prepared, processed, normalized and uploaded the dataset in HuggingFace:

	@article{zuluaga2022how,
	title={How Does Pre-trained Wav2Vec2. 0 Perform on Domain Shifted ASR? An Extensive Benchmark on Air Traffic Control Communications},
	author={Zuluaga-Gomez, Juan and Prasad, Amrutha and Nigmatulina, Iuliia and Sarfjoo, Saeed and others},
	journal={IEEE Spoken Language Technology Workshop (SLT), Doha, Qatar},
	year={2022}
	}

	@article{zuluaga2022bertraffic,
	title={BERTraffic: BERT-based Joint Speaker Role and Speaker Change Detection for Air Traffic Control Communications},
	author={Zuluaga-Gomez, Juan and Sarfjoo, Seyyed Saeed and Prasad, Amrutha and others},
	journal={IEEE Spoken Language Technology Workshop (SLT), Doha, Qatar},
	year={2022}
	}

	@article{zuluaga2022atco2,
	title={ATCO2 corpus: A Large-Scale Dataset for Research on Automatic Speech Recognition and Natural Language Understanding of Air Traffic Control Communications},
	author={Zuluaga-Gomez, Juan and Vesel{\`y}, Karel and Sz{\"o}ke, Igor and Motlicek, Petr and others},
	journal={arXiv preprint arXiv:2211.04054},
	year={2022}
	}

	## Authors of the dataset:

	@article{vsmidl2019air,
	title={Air traffic control communication (ATCC) speech corpora and their use for ASR and TTS development},
	author={{\v{S}}m{\'\i}dl, Lubo{\v{s}} and {\v{S}}vec, Jan and Tihelka, Daniel and Matou{\v{s}}ek, Jind{\v{r}}ich and Romportl, Jan and Ircing, Pavel},
	journal={Language Resources and Evaluation},
	volume={53},
	number={3},
	pages={449--464},
	year={2019},
	publisher={Springer}
	}