hoangbinhmta99 commited on
Commit
f3c6047
·
1 Parent(s): a9938a9

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +32 -126
README.md CHANGED
@@ -9,16 +9,10 @@ pipeline_tag: automatic-speech-recognition
9
  tags:
10
  - audio
11
  - speech
12
- - speechbrain
13
  - Transformer
14
  license: cc-by-nc-4.0
15
- widget:
16
- - example_title: Example 1
17
- src: https://huggingface.co/dragonSwing/wav2vec2-base-vn-270h/raw/main/example.mp3
18
- - example_title: Example 2
19
- src: https://huggingface.co/dragonSwing/wav2vec2-base-vn-270h/raw/main/example2.mp3
20
  model-index:
21
- - name: Wav2vec2 Base Vietnamese 270h
22
  results:
23
  - task:
24
  name: Speech Recognition
@@ -30,127 +24,39 @@ model-index:
30
  metrics:
31
  - name: Test WER
32
  type: wer
33
- value: 9.66
34
- - task:
35
- name: Speech Recognition
36
- type: automatic-speech-recognition
37
- dataset:
38
- name: Common Voice 7.0
39
- type: mozilla-foundation/common_voice_7_0
40
- args: vi
41
- metrics:
42
- - name: Test WER
43
- type: wer
44
- value: 5.57
45
- - task:
46
- name: Speech Recognition
47
- type: automatic-speech-recognition
48
- dataset:
49
- name: Common Voice 8.0
50
- type: mozilla-foundation/common_voice_8_0
51
- args: vi
52
- metrics:
53
- - name: Test WER
54
- type: wer
55
- value: 5.76
56
- - task:
57
- name: Speech Recognition
58
- type: automatic-speech-recognition
59
- dataset:
60
- name: VIVOS
61
- type: vivos
62
- args: vi
63
- metrics:
64
- - name: Test WER
65
- type: wer
66
- value: 3.70
67
  ---
68
- # Wav2Vec2-Base-Vietnamese-270h
69
- Fine-tuned Wav2Vec2 model on Vietnamese Speech Recognition task using about 270h labelled data combined from multiple datasets including [Common Voice](https://huggingface.co/datasets/common_voice), [VIVOS](https://huggingface.co/datasets/vivos), [VLSP2020](https://vlsp.org.vn/vlsp2020/eval/asr). The model was fine-tuned using SpeechBrain toolkit with a custom tokenizer. For a better experience, we encourage you to learn more about [SpeechBrain](https://speechbrain.github.io/).
70
- When using this model, make sure that your speech input is sampled at 16kHz.
71
- Please refer to [huggingface blog](https://huggingface.co/blog/fine-tune-wav2vec2-english) or [speechbrain](https://github.com/speechbrain/speechbrain/tree/develop/recipes/CommonVoice/ASR/CTC) on how to fine-tune Wav2Vec2 model on a specific language.
72
-
73
- ### Benchmark WER result:
74
- | | [VIVOS](https://huggingface.co/datasets/vivos) | [COMMON VOICE 7.0](https://huggingface.co/datasets/mozilla-foundation/common_voice_7_0) | [COMMON VOICE 8.0](https://huggingface.co/datasets/mozilla-foundation/common_voice_8_0) |
75
- |---|---|---|---|
76
- |without LM| 8.23 | 12.15 | 12.15 |
77
- |with 4-grams LM| 3.70 | 5.57 | 5.76 |
78
-
79
- The language model was trained using [OSCAR](https://huggingface.co/datasets/oscar-corpus/OSCAR-2109) dataset on about 32GB of crawled text.
80
 
81
- ### Install SpeechBrain
82
- To use this model, you should install speechbrain > 0.5.10
83
-
84
- ### Usage
85
- The model can be used directly (without a language model) as follows:
86
- ```python
87
- from speechbrain.pretrained import EncoderASR
88
-
89
- model = EncoderASR.from_hparams(source="dragonSwing/wav2vec2-base-vn-270h", savedir="pretrained_models/asr-wav2vec2-vi")
90
- model.transcribe_file('dragonSwing/wav2vec2-base-vn-270h/example.mp3')
91
- # Output: được hồ chí minh coi là một động lực lớn của sự phát triển đất nước
92
- ```
93
-
94
- ### Inference on GPU
95
- To perform inference on the GPU, add `run_opts={"device":"cuda"}` when calling the `from_hparams` method.
96
-
97
- ### Evaluation
98
- The model can be evaluated as follows on the Vietnamese test data of Common Voice 8.0.
99
- ```python
100
- import torch
101
- import torchaudio
102
- from datasets import load_dataset, load_metric, Audio
103
- from transformers import Wav2Vec2FeatureExtractor
104
- from speechbrain.pretrained import EncoderASR
105
- import re
106
- test_dataset = load_dataset("mozilla-foundation/common_voice_8_0", "vi", split="test", use_auth_token=True)
107
- test_dataset = test_dataset.cast_column("audio", Audio(sampling_rate=16_000))
108
- device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
109
- wer = load_metric("wer")
110
- extractor = Wav2Vec2FeatureExtractor.from_pretrained("dragonSwing/wav2vec2-base-vn-270h")
111
- model = EncoderASR.from_hparams(source="dragonSwing/wav2vec2-base-vn-270h", savedir="pretrained_models/asr-wav2vec2-vi", run_opts={'device': device})
112
- chars_to_ignore_regex = r'[,?.!\-;:"“%\'�]'
113
- # Preprocessing the datasets.
114
- # We need to read the audio files as arrays
115
- def speech_file_to_array_fn(batch):
116
- audio = batch["audio"]
117
- batch["target_text"] = re.sub(chars_to_ignore_regex, '', batch["sentence"]).lower()
118
- batch['speech'] = audio['array']
119
- return batch
120
- test_dataset = test_dataset.map(speech_file_to_array_fn)
121
-
122
- def evaluate(batch):
123
- # For padding inputs only
124
- inputs = extractor(
125
- batch['speech'],
126
- sampling_rate=16000,
127
- return_tensors="pt",
128
- padding=True,
129
- do_normalize=False
130
- ).input_values
131
- input_lens = torch.ones(inputs.shape[0])
132
- pred_str, pred_tokens = model.transcribe_batch(inputs, input_lens)
133
- batch["pred_strings"] = pred_str
134
-
135
- return batch
136
- result = test_dataset.map(evaluate, batched=True, batch_size=1)
137
- print("WER: {:2f}".format(100 * wer.compute(predictions=result["pred_strings"], references=result["target_text"])))
138
  ```
139
- **Test Result**: 12.155553%
140
-
141
- #### Citation
142
  ```
143
- @misc{SB2021,
144
- author = {Ravanelli, Mirco and Parcollet, Titouan and Rouhe, Aku and Plantinga, Peter and Rastorgueva, Elena and Lugosch, Loren and Dawalatabad, Nauman and Ju-Chieh, Chou and Heba, Abdel and Grondin, Francois and Aris, William and Liao, Chien-Feng and Cornell, Samuele and Yeh, Sung-Lin and Na, Hwidong and Gao, Yan and Fu, Szu-Wei and Subakan, Cem and De Mori, Renato and Bengio, Yoshua },
145
- title = {SpeechBrain},
146
- year = {2021},
147
- publisher = {GitHub},
148
- journal = {GitHub repository},
149
- howpublished = {\\\\url{https://github.com/speechbrain/speechbrain}},
150
- }
 
 
 
 
 
 
151
  ```
152
-
153
- #### About SpeechBrain
154
- SpeechBrain is an open-source and all-in-one speech toolkit. It is designed to be simple, extremely flexible, and user-friendly. Competitive or state-of-the-art performance is obtained in various domains.
155
- Website: [https://speechbrain.github.io](https://speechbrain.github.io/)
156
- GitHub: [https://github.com/speechbrain/speechbrain](https://github.com/speechbrain/speechbrain)
 
9
  tags:
10
  - audio
11
  - speech
 
12
  - Transformer
13
  license: cc-by-nc-4.0
 
 
 
 
 
14
  model-index:
15
+ - name: Wav2vec2 NCKH Vietnamese 2022
16
  results:
17
  - task:
18
  name: Speech Recognition
 
24
  metrics:
25
  - name: Test WER
26
  type: wer
27
+ value: No
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
28
  ---
 
 
 
 
 
 
 
 
 
 
 
 
29
 
30
+ Convert from model .pt to transformer
31
+ Link: https://huggingface.co/tommy19970714/wav2vec2-base-960h
32
+ Bash:
33
+ ```bash
34
+ pip install transformers[sentencepiece]
35
+ pip install fairseq -U
36
+ git clone https://github.com/huggingface/transformers.git
37
+ cp transformers/src/transformers/models/wav2vec2/convert_wav2vec2_original_pytorch_checkpoint_to_pytorch.py .
38
+ wget https://dl.fbaipublicfiles.com/fairseq/wav2vec/wav2vec_small.pt -O ./wav2vec_small.pt
39
+ mkdir dict
40
+ wget https://dl.fbaipublicfiles.com/fairseq/wav2vec/dict.ltr.txt
41
+ mkdir outputs
42
+ python convert_wav2vec2_original_pytorch_checkpoint_to_pytorch.py
43
+ --pytorch_dump_folder_path ./outputs --checkpoint_path ./finetuned/wav2vec_small.pt
44
+ --dict_path ./dict/dict.ltr.txt --not_finetuned
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
45
  ```
46
+ # install and upload model
 
 
47
  ```
48
+ curl -s https://packagecloud.io/install/repositories/github/git-lfs/script.deb.sh | sudo bash
49
+ git lfs install
50
+ sudo apt-get install git-lfs
51
+ git lfs install
52
+ git clone https://huggingface.co/hoangbinhmta99/wav2vec-demo
53
+ ls
54
+ cd wav2vec-demo/
55
+ git status
56
+ git add .
57
+ git commit -m "First model version"
58
+ git config --global user.email [yourname]
59
+ git config --global user.name [yourpass]
60
+ git commit -m "First model version"
61
+ git push
62
  ```