dragonSwing
/

wav2vec2-base-vn-270h

Automatic Speech Recognition

Model card Files Files and versions Community

dragonSwing commited on Dec 8, 2021

Commit

e0ac1cb

·

1 Parent(s): 1c0c6e6

Update README.md

Files changed (1) hide show

README.md +9 -2

README.md CHANGED Viewed

@@ -46,8 +46,8 @@ model-index:
          value: 4.04
 ---
 # Wav2Vec2-Base-Vietnamese-270h
-Fine-tuned Wav2Vec2 model on Vietnamese Speech Recognition task using about 270h labelled data combined from multiple datasets including [Common Voice](https://huggingface.co/datasets/common_voice), [VIVOS](https://huggingface.co/datasets/vivos), [VLSP2020](https://vlsp.org.vn/vlsp2020/eval/asr). The model was fine-tuned using SpeechBrain toolkit with a custom tokenizer. For a better experience, we encourage you to learn more about [SpeechBrain](https://speechbrain.github.io/).
-When using this model, make sure that your speech input is sampled at 16kHz.
 Please refer to [huggingface blog](https://huggingface.co/blog/fine-tune-wav2vec2-english) on how to fine-tune this model on a specific language.
 ### Benchmark WER result:
@@ -58,6 +58,13 @@ Please refer to [huggingface blog](https://huggingface.co/blog/fine-tune-wav2vec
 The language model was trained using [Oscar](https://huggingface.co/datasets/oscar-corpus/OSCAR-2109) dataset on about 32GB of written text.
 ### Usage
 The model can be used directly (without a language model) as follows:
 ```python

          value: 4.04
 ---
 # Wav2Vec2-Base-Vietnamese-270h
+Fine-tuned Wav2Vec2 model on Vietnamese Speech Recognition task using about 270h labelled data combined from multiple datasets including [Common Voice](https://huggingface.co/datasets/common_voice), [VIVOS](https://huggingface.co/datasets/vivos), [VLSP2020](https://vlsp.org.vn/vlsp2020/eval/asr). The model was fine-tuned using SpeechBrain toolkit with a custom tokenizer. For a better experience, we encourage you to learn more about [SpeechBrain](https://speechbrain.github.io/).
+When using this model, make sure that your speech input is sampled at 16kHz.
 Please refer to [huggingface blog](https://huggingface.co/blog/fine-tune-wav2vec2-english) on how to fine-tune this model on a specific language.
 ### Benchmark WER result:
 The language model was trained using [Oscar](https://huggingface.co/datasets/oscar-corpus/OSCAR-2109) dataset on about 32GB of written text.
+### Install SpeechBrain
+To use this model, you should install speechbrain from source. This is not required for speechbrain version > 0.5.10
+```bash
+pip install git+https://github.com/speechbrain/speechbrain.git@develop
+```
 ### Usage
 The model can be used directly (without a language model) as follows:
 ```python