rgomez-itg commited on
Commit
f5bd548
1 Parent(s): af321e6

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +71 -0
README.md CHANGED
@@ -1,3 +1,74 @@
1
  ---
2
  license: cc-by-nc-nd-4.0
 
 
 
 
 
 
 
 
 
 
3
  ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
  ---
2
  license: cc-by-nc-nd-4.0
3
+ datasets:
4
+ - openslr
5
+ language:
6
+ - gl
7
+ pipeline_tag: automatic-speech-recognition
8
+ tags:
9
+ - ITG
10
+ - PyTorch
11
+ - Transformers
12
+ - wav2vec2
13
  ---
14
+
15
+ # Wav2Vec2 Large XLSR Galician
16
+
17
+ ## Description
18
+
19
+ This is a fine-tuned version of the [facebook/wav2vec2-large-xlsr-53](https://huggingface.co/facebook/wav2vec2-large-xlsr-53) pre-trained model for ASR in galician.
20
+
21
+ ---
22
+
23
+ ## Dataset
24
+
25
+ The dataset used for fine-tuning this model was the [OpenSLR galician](https://huggingface.co/datasets/openslr/viewer/SLR77) dataset, available in the openslr repository.
26
+
27
+ ---
28
+
29
+ ## Example inference script
30
+
31
+ ### Check this example script to run our model in inference mode
32
+
33
+ ```python
34
+ import torch
35
+ from transformers import AutoProcessor, AutoModelForCTC
36
+ filename = "demo.wav" #change this line to the name of your audio file
37
+ sample_rate = 16_000
38
+ processor = AutoProcessor.from_pretrained('ITG/wav2vec2-large-xlsr-gl')
39
+ model = AutoModelForSpeechSeq2Seq.from_pretrained('ITG/wav2vec2-large-xlsr-gl')
40
+ device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')
41
+ model.to(device)
42
+ speech_array, _ = librosa.load(filename, sr=sample_rate)
43
+ inputs = processor(speech_array, sampling_rate=sample_rate, return_tensors="pt", padding=True).to(device)
44
+ with torch.no_grad():
45
+ logits = model(inputs.input_values, attention_mask=inputs.attention_mask.to(device)).logits
46
+ decode_output = processor.batch_decode(torch.argmax(logits, dim=-1))[0]
47
+ print(f"ASR Galician wav2vec2-large-xlsr output: {decode_output}")
48
+ ```
49
+ ---
50
+
51
+ ## Fine-tuning hyper-parameters
52
+
53
+ | **Hyper-parameter** | **Value** |
54
+ |:----------------------------------------:|:---------------------------:|
55
+ | Training batch size | 16 |
56
+ | Evaluation batch size | 8 |
57
+ | Learning rate | 3e-4 |
58
+ | Gradient accumulation steps | 2 |
59
+ | Group by length | true |
60
+ | Evaluation strategy | steps |
61
+ | Max training epochs | 50 |
62
+ | Max steps | 4000 |
63
+ | Generate max length | 225 |
64
+ | FP16 | true |
65
+ | Metric for best model | wer |
66
+ | Greater is better | false |
67
+
68
+
69
+ ## Fine-tuning in a different dataset or style
70
+
71
+ If you're interested in fine-tuning your own wav2vec2 model, we suggest starting with the [facebook/wav2vec2-large-xlsr-53 model](https://huggingface.co/facebook/wav2vec2-large-xlsr-53). Additionally,
72
+ you may find this [fine-tuning on galician notebook by Diego Fustes](https://github.com/diego-fustes/xlsr-fine-tuning-gl/blob/main/Fine_Tune_XLSR_Wav2Vec2_on_Galician.ipynb) to be a valuable resource.
73
+ This guide served as a helpful reference during the training process of this Galician wav2vec2-large-xlsr model!
74
+