--- language: - en library_name: transformers pipeline_tag: automatic-speech-recognition --- Model trained in int8 with LoRA Usage: prepare pipeline, providing any custom generate_kwargs supprted by https://huggingface.co/docs/transformers/v4.40.0/en/main_classes/text_generation#transformers.GenerationConfig ``` asr_model=prepare_pipeline( model_dir='.', # wherever you save the model generate_kwargs={ 'max_new_tokens':112, 'num_beams':1, 'repetition_penalty':1, 'do_sample':False } ) ``` run ASR: ``` asr_model(audio_path) ``` run ASR on full directory in `audio_dir`: If generate_kwargs not specified, will give you (deterministic) greedy decoding with up to 112 tokens generated, no repetition penalty ``` ASRdirWhisat( audio_dir, out_dir = '../whisat_results/', model_dir=".", ) ``` Training information: - Training script: tune_hf_whisper.py - Training hyperparameters: hparams.yaml - Training data manifest: PUBLIC_KIDS_TRAIN_v4_deduped.csv Note: to recreate this training you will need to acquire the following public datasets: - MyST (myst-v0.4.2) - CuKids - CSLU and ensure they are stored at paths consistend with those in the data manifest above. Reference: ``` @inproceedings{southwell2024, title={Automatic speech recognition tuned for child speech in the classroom}, author={ Southwell, Rosy and Ward , Wayne and Trinh , Viet Anh and Clevenger, Charis and Clevenger, Clay and Watts, Emily and Reitman, Jason and D’Mello, Sidney and Whitehill, Jacob}, booktitle={{IEEE} International Conference on Acoustics, Speech and Signal Processing {ICASSP} 2024, Seoul, South Korea, April 14-19, 2024}, year={2024}, } ```