File size: 3,161 Bytes
f5fcdce
 
f5bd548
 
 
 
 
 
 
 
 
 
f5fcdce
f5bd548
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
---
license: cc-by-nc-nd-4.0
datasets:
- openslr
language:
- gl
pipeline_tag: automatic-speech-recognition
tags:
- ITG
- PyTorch
- Transformers
- wav2vec2
---

# Wav2Vec2 Large XLSR Galician

## Description
  
This is a fine-tuned version of the [facebook/wav2vec2-large-xlsr-53](https://huggingface.co/facebook/wav2vec2-large-xlsr-53) pre-trained model for ASR in galician. 

---

## Dataset 

The dataset used for fine-tuning this model was the [OpenSLR galician](https://huggingface.co/datasets/openslr/viewer/SLR77) dataset, available in the openslr repository.

--- 

## Example inference script

### Check this example script to run our model in inference mode

```python
import torch
from transformers import AutoProcessor, AutoModelForCTC
filename = "demo.wav"  #change this line to the name of your audio file
sample_rate = 16_000   
processor = AutoProcessor.from_pretrained('ITG/wav2vec2-large-xlsr-gl')
model = AutoModelForSpeechSeq2Seq.from_pretrained('ITG/wav2vec2-large-xlsr-gl')
device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')
model.to(device)
speech_array, _ = librosa.load(filename, sr=sample_rate)
inputs = processor(speech_array, sampling_rate=sample_rate, return_tensors="pt", padding=True).to(device)
with torch.no_grad():
  logits = model(inputs.input_values, attention_mask=inputs.attention_mask.to(device)).logits
decode_output = processor.batch_decode(torch.argmax(logits, dim=-1))[0]
print(f"ASR Galician wav2vec2-large-xlsr output: {decode_output}")
```
---

## Fine-tuning hyper-parameters

|            **Hyper-parameter**           |          **Value**          |
|:----------------------------------------:|:---------------------------:|
|            Training batch size           |             16              |
|           Evaluation batch size          |             8               |
|               Learning rate              |             3e-4            |
|         Gradient accumulation steps      |             2               | 
|             Group by length              |             true            |
|            Evaluation strategy           |             steps           |
|            Max training epochs           |             50              |
|                Max steps                 |             4000            |
|            Generate max length           |             225             |
|                  FP16                    |             true            |
|          Metric for best model           |             wer             |
|            Greater is better             |             false           |


## Fine-tuning in a different dataset or style

If you're interested in fine-tuning your own wav2vec2 model, we suggest starting with the [facebook/wav2vec2-large-xlsr-53 model](https://huggingface.co/facebook/wav2vec2-large-xlsr-53). Additionally, 
you may find this [fine-tuning on galician notebook by Diego Fustes](https://github.com/diego-fustes/xlsr-fine-tuning-gl/blob/main/Fine_Tune_XLSR_Wav2Vec2_on_Galician.ipynb) to be a valuable resource. 
This guide served as a helpful reference during the training process of this Galician wav2vec2-large-xlsr model!