NTQAI
/

wav2vec2-large-japanese

Automatic Speech Recognition

Inference Endpoints

Model card Files Files and versions Community

nhanv commited on Jul 5, 2021

Commit

e5d20a5

•

1 Parent(s): d018eb0

Update README.md

Files changed (1) hide show

README.md +6 -6

README.md CHANGED Viewed

@@ -23,15 +23,15 @@ model-index:
     metrics:
        - name: Test WER
          type: wer
-         value: 81.80
        - name: Test CER
          type: cer
-         value: 20.16
 ---
-# Wav2Vec2-Large-XLSR-53-Japanese
-Fine-tuned [facebook/wav2vec2-large-xlsr-53](https://huggingface.co/facebook/wav2vec2-large-xlsr-53) on Japanese using the [Common Voice](https://huggingface.co/datasets/common_voice), [CSS10](https://github.com/Kyubyong/css10) and [JSUT](https://sites.google.com/site/shinnosuketakamichi/publication/jsut).
 When using this model, make sure that your speech input is sampled at 16kHz.
-The script used for training can be found here: https://github.com/jonatasgrosman/wav2vec2-sprint
 ## Usage
 The model can be used directly (without a language model) as follows:
 ```python
@@ -40,7 +40,7 @@ import librosa
 from datasets import load_dataset
 from transformers import Wav2Vec2ForCTC, Wav2Vec2Processor
 LANG_ID = "ja"
-MODEL_ID = "jonatasgrosman/wav2vec2-large-xlsr-53-japanese"
 SAMPLES = 10
 test_dataset = load_dataset("common_voice", LANG_ID, split=f"test[:{SAMPLES}]")
 processor = Wav2Vec2Processor.from_pretrained(MODEL_ID)

     metrics:
        - name: Test WER
          type: wer
+         value: 81.3
        - name: Test CER
          type: cer
+         value: 21.9
 ---
+# Wav2Vec2-Large-Japanese
+Fine-tuned [facebook/wav2vec2-large-xlsr-53](https://huggingface.co/facebook/wav2vec2-large-xlsr-53) on Japanese using the [Common Voice](https://huggingface.co/datasets/common_voice), [CSS10](https://github.com/Kyubyong/css10) and [JSUT](https://sites.google.com/site/shinnosuketakamichi/publication/jsut) and [CSJ]
 When using this model, make sure that your speech input is sampled at 16kHz.
 ## Usage
 The model can be used directly (without a language model) as follows:
 ```python
 from datasets import load_dataset
 from transformers import Wav2Vec2ForCTC, Wav2Vec2Processor
 LANG_ID = "ja"
+MODEL_ID = "NTQAI/wav2vec2-large-japanese"
 SAMPLES = 10
 test_dataset = load_dataset("common_voice", LANG_ID, split=f"test[:{SAMPLES}]")
 processor = Wav2Vec2Processor.from_pretrained(MODEL_ID)