not-tanh commited on
Commit
909819d
2 Parent(s): c1c964c b37ba76

Merge branch 'main' of https://huggingface.co/not-tanh/wav2vec2-large-xlsr-53-vietnamese into main

Browse files
Files changed (1) hide show
  1. README.md +11 -11
README.md CHANGED
@@ -24,12 +24,12 @@ model-index:
24
  metrics:
25
  - name: Test WER
26
  type: wer
27
- value: 52.486188
28
  ---
29
 
30
- # Wav2Vec2-Large-XLSR-53-vietnamese #TODO: replace language with your {language}, *e.g.* French
31
 
32
- Fine-tuned [facebook/wav2vec2-large-xlsr-53](https://huggingface.co/facebook/wav2vec2-large-xlsr-53) on Vietnamese using the [Common Voice](https://huggingface.co/datasets/common_voice), and [Vivos dataset](https://ailab.hcmus.edu.vn/vivos).
33
  When using this model, make sure that your speech input is sampled at 16kHz.
34
 
35
  ## Usage
@@ -42,10 +42,10 @@ import torchaudio
42
  from datasets import load_dataset
43
  from transformers import Wav2Vec2ForCTC, Wav2Vec2Processor
44
 
45
- test_dataset = load_dataset("common_voice", "vi", split="test") #TODO: replace {lang_id} in your language code here. Make sure the code is one of the *ISO codes* of [this](https://huggingface.co/languages) site.
46
 
47
- processor = Wav2Vec2Processor.from_pretrained("{model_id}") #TODO: replace {model_id} with your model id. The model id consists of {your_username}/{your_modelname}, *e.g.* `elgeish/wav2vec2-large-xlsr-53-arabic`
48
- model = Wav2Vec2ForCTC.from_pretrained("{model_id}") #TODO: replace {model_id} with your model id. The model id consists of {your_username}/{your_modelname}, *e.g.* `elgeish/wav2vec2-large-xlsr-53-arabic`
49
 
50
  resampler = torchaudio.transforms.Resample(48_000, 16_000)
51
 
@@ -71,7 +71,7 @@ print("Reference:", test_dataset["sentence"][:2])
71
 
72
  ## Evaluation
73
 
74
- The model can be evaluated as follows on the {language} test data of Common Voice. # TODO: replace #TODO: replace language with your {language}, *e.g.* French
75
 
76
 
77
  ```python
@@ -88,7 +88,7 @@ processor = Wav2Vec2Processor.from_pretrained("not-tanh/wav2vec2-large-xlsr-53-v
88
  model = Wav2Vec2ForCTC.from_pretrained("not-tanh/wav2vec2-large-xlsr-53-vietnamese")
89
  model.to("cuda")
90
 
91
- chars_to_ignore_regex = '[\\\\,\\\\?\\\\.\\\\!\\\\-\\\\;\\\\:\\\\"\\\\“%\\\\'�]'
92
  resampler = torchaudio.transforms.Resample(48_000, 16_000)
93
 
94
  # Preprocessing the datasets.
@@ -118,12 +118,12 @@ result = test_dataset.map(evaluate, batched=True, batch_size=8)
118
  print("WER: {:2f}".format(100 * wer.compute(predictions=result["pred_strings"], references=result["sentence"])))
119
  ```
120
 
121
- **Test Result**: 52.486188%
122
 
123
 
124
  ## Training
125
  ## TODO
126
 
127
- The Common Voice `train`, `validation`, and `vivos` datasets were used for training
128
 
129
- The script used for training can be found ... # TODO: fill in a link to your training script here. If you trained your model in a colab, simply fill in the link here. If you trained the model locally, it would be great if you could upload the training script on github and paste the link here.
 
24
  metrics:
25
  - name: Test WER
26
  type: wer
27
+ value: 40.745856
28
  ---
29
 
30
+ # Wav2Vec2-Large-XLSR-53-Vietnamese
31
 
32
+ Fine-tuned [facebook/wav2vec2-large-xlsr-53](https://huggingface.co/facebook/wav2vec2-large-xlsr-53) on Vietnamese using the [Common Voice](https://huggingface.co/datasets/common_voice), [Vivos dataset](https://ailab.hcmus.edu.vn/vivos) and [FOSD dataset](https://data.mendeley.com/datasets/k9sxg2twv4/4).
33
  When using this model, make sure that your speech input is sampled at 16kHz.
34
 
35
  ## Usage
 
42
  from datasets import load_dataset
43
  from transformers import Wav2Vec2ForCTC, Wav2Vec2Processor
44
 
45
+ test_dataset = load_dataset("common_voice", "vi", split="test")
46
 
47
+ processor = Wav2Vec2Processor.from_pretrained("not-tanh/wav2vec2-large-xlsr-53-vietnamese")
48
+ model = Wav2Vec2ForCTC.from_pretrained("not-tanh/wav2vec2-large-xlsr-53-vietnamese")
49
 
50
  resampler = torchaudio.transforms.Resample(48_000, 16_000)
51
 
 
71
 
72
  ## Evaluation
73
 
74
+ The model can be evaluated as follows on the Vietnamese test data of Common Voice.
75
 
76
 
77
  ```python
 
88
  model = Wav2Vec2ForCTC.from_pretrained("not-tanh/wav2vec2-large-xlsr-53-vietnamese")
89
  model.to("cuda")
90
 
91
+ chars_to_ignore_regex = r'[,?.!\-;:"“%\'�]'
92
  resampler = torchaudio.transforms.Resample(48_000, 16_000)
93
 
94
  # Preprocessing the datasets.
 
118
  print("WER: {:2f}".format(100 * wer.compute(predictions=result["pred_strings"], references=result["sentence"])))
119
  ```
120
 
121
+ **Test Result**: 40.745856%
122
 
123
 
124
  ## Training
125
  ## TODO
126
 
127
+ The Common Voice `train`, `validation`, the VIVOS and FOSD datasets were used for training
128
 
129
+ The script used for training can be found ... # TODO