|
--- |
|
license: apache-2.0 |
|
language: |
|
- yue |
|
library_name: transformers |
|
--- |
|
|
|
# Cantonese Wav2Vec2-Conformer-Base with Relative Position Embeddings |
|
|
|
wav2vec 2.0 Conformer with relative position embeddings, pretrained on |
|
2.8K hours of Cantonese spontaneous speech data sampled at 16kHz. |
|
|
|
Note: This model has not been fine-tuned on labeled text data. |
|
|
|
|
|
## Alternative Version |
|
|
|
An alternative version of the model which was pre-trained on the same dataset but |
|
with setting `layer_norm_first` to `false` is available [here](https://drive.google.com/file/d/1rbP-6pZfR5ieqAwd5_X2KzipLuKpXSsQ/view?usp=sharing) |
|
as a fairseq checkpoint and may give better downstream results. |
|
|
|
|
|
## Citation |
|
|
|
Please cite the following paper if you use the model. |
|
|
|
``` |
|
@inproceedings{huang23h_interspeech, |
|
author={Ranzo Huang and Brian Mak}, |
|
title={{wav2vec 2.0 ASR for Cantonese-Speaking Older Adults in a Clinical Setting}}, |
|
year=2023, |
|
booktitle={Proc. INTERSPEECH 2023}, |
|
pages={4958--4962}, |
|
doi={10.21437/Interspeech.2023-2470} |
|
} |
|
``` |