MehdiHosseiniMoghadam commited on
Commit
6c37025
1 Parent(s): 0337e0e

Create README.md

Browse files
Files changed (1) hide show
  1. README.md +22 -22
README.md CHANGED
@@ -1,7 +1,8 @@
1
- ---
2
- language: sv
3
  datasets:
4
- - common_voice
 
 
5
  tags:
6
  - audio
7
  - automatic-speech-recognition
@@ -9,24 +10,23 @@ tags:
9
  - xlsr-fine-tuning-week
10
  license: apache-2.0
11
  model-index:
12
- - name: XLSR Wav2Vec2 Swedish by Medhi
13
  results:
14
  - task:
15
  name: Speech Recognition
16
  type: automatic-speech-recognition
17
  dataset:
18
- name: Common Voice sv-SE
19
  type: common_voice
20
- args: sv-SE
21
  metrics:
22
  - name: Test WER
23
  type: wer
24
- value: ???
25
  ---
26
 
27
- # Wav2Vec2-Large-XLSR-53-Hungarian
28
 
29
- Fine-tuned [facebook/wav2vec2-large-xlsr-53](https://huggingface.co/facebook/wav2vec2-large-xlsr-53) in Swedish using the [Common Voice](https://huggingface.co/datasets/common_voice)
30
  When using this model, make sure that your speech input is sampled at 16kHz.
31
 
32
  ## Usage
@@ -39,10 +39,10 @@ import torchaudio
39
  from datasets import load_dataset
40
  from transformers import Wav2Vec2ForCTC, Wav2Vec2Processor
41
 
42
- test_dataset = load_dataset("common_voice", "sv-SE", split="test[:2%]").
43
 
44
- processor = Wav2Vec2Processor.from_pretrained("MehdiHosseiniMoghadam/wav2vec2-large-xlsr-53-Swedish")
45
- model = Wav2Vec2ForCTC.from_pretrained("MehdiHosseiniMoghadam/wav2vec2-large-xlsr-53-Swedish")
46
 
47
  resampler = torchaudio.transforms.Resample(48_000, 16_000)
48
 
@@ -68,7 +68,7 @@ print("Reference:", test_dataset["sentence"][:2])
68
 
69
  ## Evaluation
70
 
71
- The model can be evaluated as follows on the Swedish test data of Common Voice.
72
 
73
 
74
  ```python
@@ -78,14 +78,14 @@ from datasets import load_dataset, load_metric
78
  from transformers import Wav2Vec2ForCTC, Wav2Vec2Processor
79
  import re
80
 
81
- test_dataset = load_dataset("common_voice", "sv-SE", split="test")
82
  wer = load_metric("wer")
83
 
84
- processor = Wav2Vec2Processor.from_pretrained("MehdiHosseiniMoghadam/wav2vec2-large-xlsr-53-Swedish")
85
- model = Wav2Vec2ForCTC.from_pretrained("MehdiHosseiniMoghadam/wav2vec2-large-xlsr-53-Swedish")
86
  model.to("cuda")
87
 
88
- chars_to_ignore_regex = '[\,\?\.\!\-\;\:\"\“]'
89
  resampler = torchaudio.transforms.Resample(48_000, 16_000)
90
 
91
  # Preprocessing the datasets.
@@ -105,8 +105,8 @@ def evaluate(batch):
105
 
106
  with torch.no_grad():
107
  logits = model(inputs.input_values.to("cuda"), attention_mask=inputs.attention_mask.to("cuda")).logits
108
- pred_ids = torch.argmax(logits, dim=-1)
109
-
110
  batch["pred_strings"] = processor.batch_decode(pred_ids)
111
  return batch
112
 
@@ -115,10 +115,10 @@ result = test_dataset.map(evaluate, batched=True, batch_size=8)
115
  print("WER: {:2f}".format(100 * wer.compute(predictions=result["pred_strings"], references=result["sentence"])))
116
  ```
117
 
118
- **Test Result**: ???
119
 
120
 
121
  ## Training
122
 
123
- The Common Voice `train` and `validation` datasets were used for training.
124
- The script used for training can be found ???
1
+ language: {sv-SE}
 
2
  datasets:
3
+ - common_voice
4
+ metrics:
5
+ - wer
6
  tags:
7
  - audio
8
  - automatic-speech-recognition
10
  - xlsr-fine-tuning-week
11
  license: apache-2.0
12
  model-index:
13
+ - name: {MehdiHosseiniMoghadam/wav2vec2-large-xlsr-53-Swedish}
14
  results:
15
  - task:
16
  name: Speech Recognition
17
  type: automatic-speech-recognition
18
  dataset:
19
+ name: Common Voice {sv-SE}
20
  type: common_voice
21
+ args: {sv-SE}
22
  metrics:
23
  - name: Test WER
24
  type: wer
25
+ value: {41.388337}
26
  ---
27
 
 
28
 
29
+ Fine-tuned [facebook/wav2vec2-large-xlsr-53](https://huggingface.co/facebook/wav2vec2-large-xlsr-53) on {Swedish} using the [Common Voice](https://huggingface.co/datasets/common_voice)
30
  When using this model, make sure that your speech input is sampled at 16kHz.
31
 
32
  ## Usage
39
  from datasets import load_dataset
40
  from transformers import Wav2Vec2ForCTC, Wav2Vec2Processor
41
 
42
+ test_dataset = load_dataset("common_voice", "{sv-SE}", split="test[:2%]")
43
 
44
+ processor = Wav2Vec2Processor.from_pretrained("{MehdiHosseiniMoghadam/wav2vec2-large-xlsr-53-Swedish}")
45
+ model = Wav2Vec2ForCTC.from_pretrained("{MehdiHosseiniMoghadam/wav2vec2-large-xlsr-53-Swedish}")
46
 
47
  resampler = torchaudio.transforms.Resample(48_000, 16_000)
48
 
68
 
69
  ## Evaluation
70
 
71
+ The model can be evaluated as follows on the {Swedish} test data of Common Voice.
72
 
73
 
74
  ```python
78
  from transformers import Wav2Vec2ForCTC, Wav2Vec2Processor
79
  import re
80
 
81
+ test_dataset = load_dataset("common_voice", "{sv-SE}", split="test")
82
  wer = load_metric("wer")
83
 
84
+ processor = Wav2Vec2Processor.from_pretrained("{MehdiHosseiniMoghadam/wav2vec2-large-xlsr-53-Swedish}")
85
+ model = Wav2Vec2ForCTC.from_pretrained("{MehdiHosseiniMoghadam/wav2vec2-large-xlsr-53-Swedish}")
86
  model.to("cuda")
87
 
88
+ chars_to_ignore_regex = '[\,\?\.\!\-\;\:\"\“]'
89
  resampler = torchaudio.transforms.Resample(48_000, 16_000)
90
 
91
  # Preprocessing the datasets.
105
 
106
  with torch.no_grad():
107
  logits = model(inputs.input_values.to("cuda"), attention_mask=inputs.attention_mask.to("cuda")).logits
108
+
109
+ pred_ids = torch.argmax(logits, dim=-1)
110
  batch["pred_strings"] = processor.batch_decode(pred_ids)
111
  return batch
112
 
115
  print("WER: {:2f}".format(100 * wer.compute(predictions=result["pred_strings"], references=result["sentence"])))
116
  ```
117
 
118
+ **Test Result**: 41.388337 %
119
 
120
 
121
  ## Training
122
 
123
+ The Common Voice `train`, `validation`
124
+