TylorShine
/

distilhubert-ft-japanese-50k

Feature Extraction

Inference Endpoints

Model card Files Files and versions Community

TylorShine commited on Apr 20, 2023

Commit

2abacad

•

1 Parent(s): 80cc0ff

Update README.md

Files changed (1) hide show

README.md +39 -0

README.md CHANGED Viewed

@@ -1,3 +1,42 @@
 ---
 license: apache-2.0
 ---

 ---
+language: ja
+tags:
+  - speech
 license: apache-2.0
 ---
+# distilhubert-ft-japanese-50k
+Fine-tuned (more precisely, continue trained) model on Japanese using the [JVS corpus](https://sites.google.com/site/shinnosuketakamichi/research-topics/jvs_corpus), [Tsukuyomi-Chan corpus](https://tyc.rei-yumesaki.net/material/corpus/), [Amitaro's ITA corpus V2.1](https://amitaro.net/), and recorded my own read [ITA corpus](https://github.com/mmorise/ita-corpus).
+Original repos, Many thanks!:
+[S3PRL](https://github.com/s3prl/s3prl/tree/main/s3prl/pretrain)
+  - Using this when training (with little modify for train using own datasets).
+[distilhubert (hf)](https://huggingface.co/ntu-spml/distilhubert)
+Note: As same as the original, this model does not have a tokenizer as it was pretrained on audio alone. In order to use this model speech recognition, a tokenizer should be created and the model should be fine-tuned on labeled text data. Check out [this blog](https://huggingface.co/blog/fine-tune-wav2vec2-english) for more in-detail explanation of how to fine-tune the model.
+# Usage
+See [this blog](https://huggingface.co/blog/fine-tune-wav2vec2-english) for more information on how to fine-tune the model. Note that the class `Wav2Vec2ForCTC` has to be replaced by `HubertForCTC`.
+Note: This is not the best checkpoint and become more accurate with continued train, I think. I'll try to continue when I have a time.
+## Credits
+- [JVS corpus](https://sites.google.com/site/shinnosuketakamichi/research-topics/jvs_corpus)
+- [Tsukuyomi-Chan corpus](https://tyc.rei-yumesaki.net/material/corpus/)
+```
+  ■つくよみちゃんコーパス（CV.夢前黎）
+  https://tyc.rei-yumesaki.net/material/corpus/
+```
+- [Amitaro's ITA corpus](https://amitaro.net/)
+```
+あみたろの声素材工房
+```
+[https://amitaro.net/](https://amitaro.net/)
+Thanks!