NTQAI
/

wav2vec2-large-japanese

Automatic Speech Recognition

Inference Endpoints

Model card Files Files and versions Community

nhanv commited on Jul 5, 2021

Commit

6d768d5

•

1 Parent(s): 22d9772

Update README.md

Files changed (1) hide show

README.md +3 -2

README.md CHANGED Viewed

@@ -29,7 +29,7 @@ model-index:
          value: 21.9
 ---
 # Wav2Vec2-Large-Japanese
-Fine-tuned [facebook/wav2vec2-large-xlsr-53](https://huggingface.co/facebook/wav2vec2-large-xlsr-53) on Japanese using the [Common Voice](https://huggingface.co/datasets/common_voice), [CSS10](https://github.com/Kyubyong/css10) and [JSUT](https://sites.google.com/site/shinnosuketakamichi/publication/jsut) and [TEDxJP](https://github.com/laboroai/TEDxJP-10K) and some other data.
 When using this model, make sure that your speech input is sampled at 16kHz.
@@ -124,10 +124,11 @@ references = [x.upper() for x in result["sentence"]]
 print(f"WER: {wer.compute(predictions=predictions, references=references, chunk_size=1000) * 100}")
 print(f"CER: {cer.compute(predictions=predictions, references=references, chunk_size=1000) * 100}")
 ```
 **Test Result**:
 In the table below I report the Word Error Rate (WER) and the Character Error Rate (CER) of the model. I ran the evaluation script described above on other models as well (on 2021-05-10). Note that the table below may show different results from those already reported, this may have been caused due to some specificity of the other evaluation scripts used.
 | Model | WER | CER |
 | ------------- | ------------- | ------------- |
-| jonatasgrosman/wav2vec2-large-xlsr-53-japanese | **81.80%** | **20.16%** |
 | vumichien/wav2vec2-large-xlsr-japanese | 1108.86% | 23.40% |
 | qqhann/w2v_hf_jsut_xlsr53 | 1012.18% | 70.77% |

          value: 21.9
 ---
 # Wav2Vec2-Large-Japanese
+Fine-tuned [facebook/wav2vec2-large-xlsr-53](https://huggingface.co/facebook/wav2vec2-large-xlsr-53) on Japanese using the [Common Voice](https://huggingface.co/datasets/common_voice),[JSUT](https://sites.google.com/site/shinnosuketakamichi/publication/jsut), [TEDxJP](https://github.com/laboroai/TEDxJP-10K) and some other data.
 When using this model, make sure that your speech input is sampled at 16kHz.
 print(f"WER: {wer.compute(predictions=predictions, references=references, chunk_size=1000) * 100}")
 print(f"CER: {cer.compute(predictions=predictions, references=references, chunk_size=1000) * 100}")
 ```
 **Test Result**:
 In the table below I report the Word Error Rate (WER) and the Character Error Rate (CER) of the model. I ran the evaluation script described above on other models as well (on 2021-05-10). Note that the table below may show different results from those already reported, this may have been caused due to some specificity of the other evaluation scripts used.
 | Model | WER | CER |
 | ------------- | ------------- | ------------- |
+| jonatasgrosman/wav2vec2-large-xlsr-53-japanese | **81.30%** | **21.9%** |
 | vumichien/wav2vec2-large-xlsr-japanese | 1108.86% | 23.40% |
 | qqhann/w2v_hf_jsut_xlsr53 | 1012.18% | 70.77% |