wcfr
/

wav2vec2-conformer-rel-pos-base-cantonese

wav2vec2-conformer

Inference Endpoints

Model card Files Files and versions Community

wav2vec2-conformer-rel-pos-base-cantonese / README.md

wcfr's picture

Update citation

a0e0329 over 1 year ago

|

1.02 kB

	---
	license: apache-2.0
	language:
	- yue
	library_name: transformers
	---

	# Cantonese Wav2Vec2-Conformer-Base with Relative Position Embeddings

	wav2vec 2.0 Conformer with relative position embeddings, pretrained on
	2.8K hours of Cantonese spontaneous speech data sampled at 16kHz.

	Note: This model has not been fine-tuned on labeled text data.


	## Alternative Version

	An alternative version of the model which was pre-trained on the same dataset but
	with setting `layer_norm_first` to `false` is available [here](https://drive.google.com/file/d/1rbP-6pZfR5ieqAwd5_X2KzipLuKpXSsQ/view?usp=sharing)
	as a fairseq checkpoint and may give better downstream results.


	## Citation

	Please cite the following paper if you use the model.

	```
	@inproceedings{huang23h_interspeech,
	author={Ranzo Huang and Brian Mak},
	title={{wav2vec 2.0 ASR for Cantonese-Speaking Older Adults in a Clinical Setting}},
	year=2023,
	booktitle={Proc. INTERSPEECH 2023},
	pages={4958--4962},
	doi={10.21437/Interspeech.2023-2470}
	}
	```