---
license: apache-2.0
datasets:
- alvanlii/cantonese-youtube
base_model:
- TencentGameMate/chinese-hubert-base
library_name: fairseq
---

# cantonese-hubert-base-l9-k200

This is a fine-tuned Hubert model based on [TencentGameMate/chinese-hubert-base](https://huggingface.co/TencentGameMate/chinese-hubert-base) for generate speech discete units, The K-means model is trained on [9k+ hours Cantonese speech data](https://huggingface.co/datasets/alvanlii/cantonese-youtube), with 200 clusters and representations from 9th layer of the model.