rosyvs
/

whisat

Automatic Speech Recognition

Inference Endpoints

Model card Files Files and versions Community

whisat / README.md

rosyvs's picture

md nice

91d1a17 verified 7 months ago

|

1.84 kB

	---
	language:
	- en
	library_name: transformers
	pipeline_tag: automatic-speech-recognition
	---
	Model trained in int8 with LoRA

	Usage:

	prepare pipeline, providing any custom generate_kwargs supprted by https://huggingface.co/docs/transformers/v4.40.0/en/main_classes/text_generation#transformers.GenerationConfig

	```
	asr_model=prepare_pipeline(
	model_dir='.', # wherever you save the model
	generate_kwargs={
	'max_new_tokens':112,
	'num_beams':1,
	'repetition_penalty':1,
	'do_sample':False
	}
	)
	```
	run ASR:
	```
	asr_model(audio_path)
	```

	run ASR on full directory in `audio_dir`:
	If generate_kwargs not specified, will give you (deterministic) greedy decoding with up to 112 tokens generated, no repetition penalty

	```
	ASRdirWhisat(
	audio_dir,
	out_dir = '../whisat_results/',
	model_dir=".",
	)
	```


	Training information:
	- Training script: tune_hf_whisper.py
	- Training hyperparameters: hparams.yaml
	- Training data manifest: PUBLIC_KIDS_TRAIN_v4_deduped.csv

	Note: to recreate this training you will need to acquire the following public datasets:
	- MyST (myst-v0.4.2)
	- CuKids
	- CSLU

	and ensure they are stored at paths consistend with those in the data manifest above.

	Reference:
	```
	@inproceedings{southwell2024,
	title={Automatic speech recognition tuned for child speech in the classroom},
	author={ Southwell, Rosy and Ward , Wayne and Trinh , Viet Anh and Clevenger, Charis and Clevenger, Clay and Watts, Emily and Reitman, Jason and D’Mello, Sidney and Whitehill, Jacob},
	booktitle={{IEEE} International Conference on Acoustics, Speech and Signal Processing
	{ICASSP} 2024, Seoul, South Korea, April 14-19, 2024},
	year={2024},
	}
	```