ntu-spml
/

distilhubert

Feature Extraction

Inference Endpoints

Model card Files Files and versions Community

patrickvonplaten commited on Nov 5, 2021

Commit

9c4eece

·

1 Parent(s): 7824fed

Update README.md

Files changed (1) hide show

README.md +3 -1

README.md CHANGED Viewed

@@ -11,7 +11,9 @@ license: apache-2.0
 [DistilHuBERT by NTU Speech Processing & Machine Learning Lab](https://github.com/s3prl/s3prl/tree/master/s3prl/upstream/distiller)
-The base model pretrained on 16kHz sampled speech audio. When using the model make sure that your speech input is also sampled at 16Khz. Note that this model should be fine-tuned on a downstream task, like Automatic Speech Recognition, Speaker Identification, Intent Classification, Emotion Recognition, etc...
 Paper: [DistilHuBERT: Speech Representation Learning by Layer-wise Distillation of Hidden-unit BERT](https://arxiv.org/abs/2110.01900)

 [DistilHuBERT by NTU Speech Processing & Machine Learning Lab](https://github.com/s3prl/s3prl/tree/master/s3prl/upstream/distiller)
+The base model pretrained on 16kHz sampled speech audio. When using the model make sure that your speech input is also sampled at 16Khz.
+**Note**: This model does not have a tokenizer as it was pretrained on audio alone. In order to use this model **speech recognition**, a tokenizer should be created and the model should be fine-tuned on labeled text data. Check out [this blog](https://huggingface.co/blog/fine-tune-wav2vec2-english) for more in-detail explanation of how to fine-tune the model.
 Paper: [DistilHuBERT: Speech Representation Learning by Layer-wise Distillation of Hidden-unit BERT](https://arxiv.org/abs/2110.01900)