marinone94
/

xls-r-300m-sv-robust

Automatic Speech Recognition

mozilla-foundation/common_voice_9_0

Generated from Trainer

Inference Endpoints

Model card Files Files and versions Community

marinone94 commited on May 27, 2022

Commit

868234c

•

1 Parent(s): d73da22

script to continue training

Files changed (2) hide show

README_TEMPLATE.md +6 -1
cont-run.sh +35 -0

README_TEMPLATE.md CHANGED Viewed

@@ -47,4 +47,9 @@ model-index:
 This model is a fine-tuned version of [KBLab/wav2vec2-large-voxrex](https://huggingface.co/KBLab/wav2vec2-large-voxrex) on 2 epochs of the MARINONE94/NST_SV - SV dataset (80% random split with seed 42 as the dataset for now has only the "train" split), and then on 50 epochs of the the MOZILLA-FOUNDATION/COMMON_VOICE_8_0 - SV-SE dataset ("train+validation" split).
 See run.sh to have a complete overview of all the training steps.
-NOTE: the first training for now didn't work as expected, so it might be useless or even degrade performance. Further investigation and development is needed.

 This model is a fine-tuned version of [KBLab/wav2vec2-large-voxrex](https://huggingface.co/KBLab/wav2vec2-large-voxrex) on 2 epochs of the MARINONE94/NST_SV - SV dataset (80% random split with seed 42 as the dataset for now has only the "train" split), and then on 50 epochs of the the MOZILLA-FOUNDATION/COMMON_VOICE_8_0 - SV-SE dataset ("train+validation" split).
 See run.sh to have a complete overview of all the training steps.
+NOTE: the first training for now didn't work as expected, so it might be useless or even degrade performance. Further investigation and development is needed.
+d73da225cfdc57213ea4ab67b24bb87ac41f4392 is the commit at the end of the first training:
+```
+sh run.sh
+```

cont-run.sh ADDED Viewed

	@@ -0,0 +1,35 @@

+python run_speech_recognition_ctc.py \
+	--dataset_name="mozilla-foundation/common_voice_9_0" \
+	--model_name_or_path="KBLab/wav2vec2-large-voxrex" \
+	--dataset_config_name="sv-SE" \
+	--train_split_name="train+validation" \
+	--eval_split_name="test" \
+	--output_dir="./" \
+	--num_train_epochs="150" \
+	--per_device_train_batch_size="32" \
+	--per_device_eval_batch_size="32" \
+	--gradient_accumulation_steps="4" \
+	--learning_rate="7.5e-4" \
+	--length_column_name="input_length" \
+	--evaluation_strategy="steps" \
+	--save_strategy="steps" \
+    --eval_steps="1000" \
+    --save_steps="1000" \
+	--text_column_name="sentence" \
+	--chars_to_ignore , ? . ! \- \; \: \" “ % ‘ ” � — ’ … – \
+	--logging_steps="100" \
+	--layerdrop="0.0" \
+	--activation_dropout="0.15" \
+	--save_total_limit="2" \
+	--freeze_feature_encoder \
+	--feat_proj_dropout="0.0" \
+	--mask_time_prob="0.75" \
+	--mask_time_length="10" \
+	--mask_feature_prob="0.25" \
+	--mask_feature_length="64" \
+	--gradient_checkpointing \
+	--use_auth_token \
+	--fp16 \
+	--group_by_length \
+	--do_train --do_eval \
+	--push_to_hub