PereLluis13
/

wav2vec2-xls-r-1b-ca

+---
+language:
+- ca
+license: apache-2.0
+tags:
+- automatic-speech-recognition
+- mozilla-foundation/common_voice_8_0
+- collectivat/tv3_parla
+- projecte-aina/parlament_parla
+- generated_from_trainer
+- robust-speech-event
+datasets:
+- mozilla-foundation/common_voice_8_0
+- collectivat/tv3_parla
+- projecte-aina/parlament_parla
+model-index:
+- name: wav2vec2-xls-r-1b-ca
+  results:[]
+---
+<!-- This model card has been generated automatically according to the information the Trainer had access to. You
+should probably proofread and complete it, then remove this comment. -->
+# wav2vec2-xls-r-1b-ca
+This model is a fine-tuned version of [facebook/wav2vec2-xls-r-1b](https://huggingface.co/facebook/wav2vec2-xls-r-1b) on the MOZILLA-FOUNDATION/COMMON_VOICE_8_0 - CA dataset.
+## Model description
+Please check the original [facebook/wav2vec2-xls-r-1b](https://huggingface.co/facebook/wav2vec2-xls-r-1b) Model card. This is just a finetuned version of that model.
+## Intended uses & limitations
+As any model trained on crowdsourced data, this model can show the biases and particularities of the data and model used to train this model. Moreover, since this is a speech recognition model, it may underperform for some lower-resourced dialects for the catalan language.
+## Training and evaluation data
+## Training procedure
+The data is preprocessed to remove characters not on the catalan alphabet. Moreover, numbers are verbalized using code provided by [@ccoreilly](https://github.com/ccoreilly), which can be found on the text/ folder or [here](https://github.com/CollectivaT-dev/catotron-cpu/blob/master/text/numbers_ca.py).
+### Training results
+Check the Tensorboard tab to check the training profile and evaluation results along training. The model was evaluated on the test splits for each of the datasets used during training.
+### Training hyperparameters
+The following hyperparameters were used during training:
+- learning_rate: 2e-05
+- train_batch_size: 8
+- eval_batch_size: 8
+- seed: 42
+- gradient_accumulation_steps: 8
+- total_train_batch_size: 64
+- optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
+- lr_scheduler_type: linear
+- lr_scheduler_warmup_steps: 2000
+- num_epochs: 10.0
+- mixed_precision_training: Native AMP
+### Framework versions
+- Transformers 4.17.0.dev0
+- Pytorch 1.10.2+cu102
+- Datasets 1.18.3
+- Tokenizers 0.11.0
+# Thanks
+Want to thank both [@ccoreilly](https://github.com/ccoreilly) and [@gullabi](https://github.com/gullabi) who have contributed with their own resources and knowledge into making this model possible.

pytorch_model.bin CHANGED Viewed

@@ -1,3 +1,3 @@
 version https://git-lfs.github.com/spec/v1
-oid sha256:31c0bd86b26817f538c5c24284ba7ffa3959089cd1e2b7fc9d0d05c8e4904b99
 size 3850543281

 version https://git-lfs.github.com/spec/v1
+oid sha256:460a907cccb967dcaf1e86c147c373256527b490cd81f16e4118691f11540bc1
 size 3850543281

runs/Feb02_22-48-04_job-7083fbbc-ffb8-4f9b-8706-99212ecf5dd3/events.out.tfevents.1643843696.job-7083fbbc-ffb8-4f9b-8706-99212ecf5dd3.34573.0 CHANGED Viewed

@@ -1,3 +1,3 @@
 version https://git-lfs.github.com/spec/v1
-oid sha256:aba9198ff43d95dc854e069ded2d0589e9bc084520c9ec177e8a52856f6c0820
-size 22018

 version https://git-lfs.github.com/spec/v1
+oid sha256:be991b3e225de2c0a64b9d2532338398f4e6e63f66a343c0d1e0744f105a9ecd
+size 22862