File size: 3,468 Bytes

873a17c
ed7e4ac
 
b8353a4
ed7e4ac
873a17c
ed7e4ac
 
873a17c
a9d4bb3
ed7e4ac
 
 
873a17c
ed37d85
ed7e4ac
 
 
 
 
a9d4bb3
ed7e4ac
 
 
 
b8353a4
ed7e4ac
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
b8353a4
 
873a17c
 
7c41b61
873a17c
90342ae
 
7c41b61
 
0585a77
873a17c
 
ed7e4ac
 
873a17c
 
 
 
 
 
 
 
 
 
 
 
d49441f
90342ae
873a17c
 
 
 
ed7e4ac
 
 
 
 
 
 
 
873a17c
 
 
 
d49441f
873a17c

---
language: 
- ur

license: apache-2.0
tags:
- automatic-speech-recognition
- robust-speech-event
datasets:
- mozilla-foundation/common_voice_7_0
metrics:
- wer
- cer
model-index:
- name: wav2vec2-60-urdu
  results:
  - task: 
      type: automatic-speech-recognition  # Required. Example: automatic-speech-recognition
      name: Urdu Speech Recognition  # Optional. Example: Speech Recognition
    dataset:
      type: mozilla-foundation/common_voice_7_0  # Required. Example: common_voice. Use dataset id from https://hf.co/datasets
      name: Urdu  # Required. Example: Common Voice zh-CN
      args: ur         # Optional. Example: zh-CN
    metrics:
      - type: wer    # Required. Example: wer
        value: 59.2  # Required. Example: 20.90
        name: Test WER    # Optional. Example: Test WER
        args: 
        - learning_rate: 0.0003
        - train_batch_size: 16
        - eval_batch_size: 8
        - seed: 42
        - gradient_accumulation_steps: 2
        - total_train_batch_size: 32
        - optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
        - lr_scheduler_type: linear
        - lr_scheduler_warmup_steps: 200
        - num_epochs: 50
        - mixed_precision_training: Native AMP         # Optional. Example for BLEU: max_order
      - type: cer    # Required. Example: wer
        value: 32.9  # Required. Example: 20.90
        name: Test CER    # Optional. Example: Test WER
        args: 
        - learning_rate: 0.0003
        - train_batch_size: 16
        - eval_batch_size: 8
        - seed: 42
        - gradient_accumulation_steps: 2
        - total_train_batch_size: 32
        - optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
        - lr_scheduler_type: linear
        - lr_scheduler_warmup_steps: 200
        - num_epochs: 50
        - mixed_precision_training: Native AMP         # Optional. Example for BLEU: max_order
---
# wav2vec2-large-xlsr-53-urdu

This model is a fine-tuned version of [Harveenchadha/vakyansh-wav2vec2-urdu-urm-60](https://huggingface.co/Harveenchadha/vakyansh-wav2vec2-urdu-urm-60) on the common_voice dataset.
It achieves the following results on the evaluation set:
- Wer: 0.5921
- Cer: 0.3288

## Model description
The training and valid dataset is 0.58 hours. It was hard to train any model on lower number of so I decided to take vakyansh-wav2vec2-urdu-urm-60 checkpoint and finetune the wav2vec2 model.  

## Training procedure
Trained on Harveenchadha/vakyansh-wav2vec2-urdu-urm-60 due to lesser number of samples.


### Training hyperparameters

The following hyperparameters were used during training:
- learning_rate: 0.0003
- train_batch_size: 16
- eval_batch_size: 8
- seed: 42
- gradient_accumulation_steps: 2
- total_train_batch_size: 32
- optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
- lr_scheduler_type: linear
- lr_scheduler_warmup_steps: 200
- num_epochs: 50
- mixed_precision_training: Native AMP

### Training results

| Training Loss | Epoch | Step | Wer    | Cer    |
|:-------------:|:-----:|:----:|:------:|:------:|
| 13.83         | 8.33  | 100  | 0.6611 | 0.3639 |
| 1.0144        | 16.67 | 200  | 0.6498 | 0.3731 |
| 0.5801        | 25.0  | 300  | 0.6454 | 0.3767 |
| 0.3344        | 33.33 | 400  | 0.6349 | 0.3548 |
| 0.1606        | 41.67 | 500  | 0.6105 | 0.3348 |
| 0.0974        | 50.0  | 600  | 0.5921 | 0.3288 |


### Framework versions

- Transformers 4.15.0
- Pytorch 1.10.0+cu111
- Datasets 1.17.0
- Tokenizers 0.10.3