facebook/s2t-small-librispeech-asr

msis

Jul 29, 2022

Can you please share scripts used to train the model ?

lysandre

Aug 1, 2022

cc @sanchit-gandhi @anton-l

sanchit-gandhi

Aug 1, 2022

The scripts used to train the original model can be found in the fairseq repository: https://github.com/facebookresearch/fairseq/tree/main/examples/speech_to_text

In order to fine-tune the checkpoint using Transformers 🤗, you can adapt the example seq2seq training script provided here: https://github.com/huggingface/transformers/blob/main/examples/pytorch/speech-recognition/run_speech_recognition_seq2seq.py
In this case, there won't be a need to create an encoder-decoder model from scratch. Instead, you can jump right into training and run the script directly using the template command: https://github.com/huggingface/transformers/tree/main/examples/pytorch/speech-recognition#single-gpu-seq2seq
The only change you'll need to make is replacing the model_name_or_path arg with the Hub id of the model (facebook/s2t-small-librispeech-asr):

python run_speech_recognition_seq2seq.py \
     --nproc_per_node 8 run_speech_recognition_seq2seq.py \
    --dataset_name="librispeech_asr" \
    --model_name_or_path="facebook/s2t-small-librispeech-asr" \
    --dataset_config_name="clean" \
    --train_split_name="train.100" \
    --eval_split_name="validation" \
    --output_dir="./" \
    --preprocessing_num_workers="16" \
    --length_column_name="input_length" \
    --overwrite_output_dir \
    --num_train_epochs="5" \
    --per_device_train_batch_size="8" \
    --per_device_eval_batch_size="8" \
    --gradient_accumulation_steps="8" \
    --learning_rate="3e-4" \
    --warmup_steps="400" \
    --evaluation_strategy="steps" \
    --text_column_name="text" \
    --save_steps="400" \
    --eval_steps="400" \
    --logging_steps="10" \
    --save_total_limit="1" \
    --freeze_feature_encoder \
    --gradient_checkpointing \
    --fp16 \
    --group_by_length \
    --predict_with_generate \
    --generation_max_length="40" \
    --generation_num_beams="1" \
    --do_train --do_eval \
    --do_lower_case

facebook
/

s2t-small-librispeech-asr

Training steps