|
--- |
|
language: |
|
- en |
|
library_name: transformers |
|
pipeline_tag: automatic-speech-recognition |
|
--- |
|
Model trained in int8 with LoRA |
|
|
|
Usage: |
|
|
|
prepare pipeline, providing any custom generate_kwargs supprted by https://huggingface.co/docs/transformers/v4.40.0/en/main_classes/text_generation#transformers.GenerationConfig |
|
|
|
``` |
|
asr_model=prepare_pipeline( |
|
model_dir='.', # wherever you save the model |
|
generate_kwargs={ |
|
'max_new_tokens':112, |
|
'num_beams':1, |
|
'repetition_penalty':1, |
|
'do_sample':False |
|
} |
|
) |
|
``` |
|
run ASR: |
|
``` |
|
asr_model(audio_path) |
|
``` |
|
|
|
run ASR on full directory in `audio_dir`: |
|
If generate_kwargs not specified, will give you (deterministic) greedy decoding with up to 112 tokens generated, no repetition penalty |
|
|
|
``` |
|
ASRdirWhisat( |
|
audio_dir, |
|
out_dir = '../whisat_results/', |
|
model_dir=".", |
|
) |
|
``` |
|
|
|
|
|
Training information: |
|
Training script: tune_hf_whisper.py |
|
Training hyperparameters: hparams.yaml |
|
Training data manifest: PUBLIC_KIDS_TRAIN_v4_deduped.csv |
|
|
|
Note: to recreate this training you will need to acquire the following public datasets: |
|
MyST (myst-v0.4.2) |
|
CuKids |
|
CSLU |
|
|
|
and ensure they are stored at paths consistend with those in the data manifest above. |
|
|
|
Reference: |
|
@inproceedings{southwell2024, |
|
title={Automatic speech recognition tuned for child speech in the classroom}, |
|
author={ Southwell, Rosy and Ward , Wayne and Trinh , Viet Anh and Clevenger, Charis and Clevenger, Clay and Watts, Emily and Reitman, Jason and D’Mello, Sidney and Whitehill, Jacob}, |
|
booktitle={{IEEE} International Conference on Acoustics, Speech and Signal Processing |
|
{ICASSP} 2024, Seoul, South Korea, April 14-19, 2024}, |
|
year={2024}, |
|
} |