ivangtorre commited on
Commit
f746e7b
1 Parent(s): d6328ef

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +109 -3
README.md CHANGED
@@ -1,3 +1,109 @@
1
- ---
2
- license: cc-by-4.0
3
- ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ license: cc-by-4.0
3
+ language:
4
+ - qu
5
+ metrics:
6
+ - cer
7
+ pipeline_tag: automatic-speech-recognition
8
+ datasets:
9
+ - ivangtorre/second_americas_nlp_2022
10
+ tags:
11
+ - audio
12
+ - automatic-speech-recognition
13
+ - speech
14
+ - waikhana
15
+ - xlsr-fine-tuning
16
+ model-index:
17
+ - name: Wav2Vec2 XLSR 300M Quechua Model by M Romero and Ivan G Torre
18
+ results:
19
+ - task:
20
+ name: Speech Recognition
21
+ type: automatic-speech-recognition
22
+ dataset:
23
+ name: Americas NLP 2022 Waikhana
24
+ type: second_americas_nlp_2022
25
+ args: Waikhana
26
+ metrics:
27
+ - name: Test CER
28
+ type: cer
29
+ value: 16.02
30
+
31
+ ---
32
+
33
+ This model was finetuned from a Wav2vec2.0 XLS-R model: 300M with the Waikhana train parition of the Americas NLP 2022 dataset. This challenge took place during NeurIPSS 2022.
34
+
35
+
36
+
37
+ ## Example of usage
38
+
39
+ The model can be used directly (without a language model) as follows:
40
+
41
+ ```python
42
+ from transformers import Wav2Vec2Processor, Wav2Vec2ForCTC
43
+ import torch
44
+ import torchaudio
45
+
46
+ # load model and processor
47
+ processor = Wav2Vec2Processor.from_pretrained("ivangtorre/wav2vec2-xlsr-300m-waikhana")
48
+ model = Wav2Vec2ForCTC.from_pretrained("ivangtorre/wav2vec2-xlsr-300m-waikhana")
49
+
50
+ # Pat to wav file
51
+ pathfile = "/path/to/wavfile"
52
+
53
+ # Load and normalize the file
54
+ wav, curr_sample_rate = sf.read(pathfile, dtype="float32")
55
+ feats = torch.from_numpy(wav).float()
56
+ with torch.no_grad():
57
+ feats = F.layer_norm(feats, feats.shape)
58
+ feats = torch.unsqueeze(feats, 0)
59
+ logits = model(feats).logits
60
+
61
+ # take argmax and decode
62
+ predicted_ids = torch.argmax(logits, dim=-1)
63
+ transcription = processor.batch_decode(predicted_ids)
64
+ print("HF prediction: ", transcription)
65
+ ```
66
+
67
+
68
+ This code snipnet shows how to Evaluate the wav2vec2-xlsr-300m-waikhana in [Second Americas NLP 2022 Waikhana dev set](https://huggingface.co/datasets/ivangtorre/second_americas_nlp_2022)
69
+
70
+ ```python
71
+ from datasets import load_dataset
72
+ from transformers import Wav2Vec2ForCTC, Wav2Vec2Processor
73
+ import torch
74
+ from jiwer import cer
75
+ import torch.nn.functional as F
76
+ from datasets import load_dataset
77
+ import soundfile as sf
78
+
79
+ americasnlp = load_dataset("ivangtorre/second_americas_nlp_2022", "waikhana", split="dev")
80
+ waikhana = americasnlp.filter(lambda language: language['subset']=='waikhana')
81
+
82
+ model = Wav2Vec2ForCTC.from_pretrained("ivangtorre/wav2vec2-xlsr-300m-waikhana")
83
+ processor = Wav2Vec2Processor.from_pretrained("ivangtorre/wav2vec2-xlsr-300m-waikhana")
84
+
85
+ def map_to_pred(batch):
86
+ wav = batch["audio"][0]["array"]
87
+ feats = torch.from_numpy(wav).float()
88
+ feats = F.layer_norm(feats, feats.shape) # Normalization performed during finetuning
89
+ feats = torch.unsqueeze(feats, 0)
90
+ logits = model(feats).logits
91
+ predicted_ids = torch.argmax(logits, dim=-1)
92
+ batch["transcription"] = processor.batch_decode(predicted_ids)
93
+ return batch
94
+
95
+ result = waikhana.map(map_to_pred, batched=True, batch_size=1)
96
+
97
+ print("CER:", cer(result["source_processed"], result["transcription"]))
98
+ ```
99
+
100
+ ## Citation
101
+
102
+ ```bibtex
103
+ @article{romero2024asr,
104
+ title={ASR advancements for indigenous languages: Quechua, Guarani, Bribri, Kotiria, and Wa'ikhana},
105
+ author={Romero, Monica and Gomez, Sandra and Torre, Iv{\'a}n G},
106
+ journal={arXiv preprint arXiv:2404.08368},
107
+ year={2024}
108
+ }
109
+ ```