File size: 1,684 Bytes
e746a17 5d0ac6b e746a17 5d0ac6b a99fe8f 3c936b4 5d0ac6b a99fe8f 9951e6d 5d0ac6b a99fe8f 5d0ac6b |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 |
---
license: other
language:
- ja
library_name: fairseq
---
# Pre-trained checkpoints for speech representation in Japanese
The models in this repository were pre-trained via self-supervised learning (SSL) for speech representation.
The SSL models were built on the [fairseq](https://github.com/facebookresearch/fairseq) toolkit.
- `wav2vec2_base_csj.pt`
- fairseq checkpoint of wav2vec2.0 model with *Base* architecture pre-trained on 16kHz sampled speech data of Corpus of Spontaneous Japanese (CSJ)
- `wav2vec2_base_csj_hf`
- converted version of `wav2vec2_base_csj.pt` compatible with the interface of Hugging Face by using [this tool](https://github.com/huggingface/transformers/blob/main/src/transformers/models/wav2vec2/convert_wav2vec2_original_pytorch_checkpoint_to_pytorch.py)
- `hubert_base_csj.pt`
- fairseq checkpoint of HuBERT model with *Base* architecture pre-trained on 16kHz sampled speech data of Corpus of Spontaneous Japanese (CSJ)
- `hubert_base_csj_hf`
- converted version of `hubert_base_csj.pt` compatible with the interface of Hugging Face by using [this tool](https://github.com/huggingface/transformers/blob/main/src/transformers/models/hubert/convert_hubert_original_pytorch_checkpoint_to_pytorch.py)
If you find this helpful, please consider citing the following paper.
```text
@INPROCEEDINGS{ashihara_icassp23,
author={Takanori Ashihara and Takafumi Moriya and Kohei Matsuura and Tomohiro Tanaka},
title={Exploration of Language Dependency for Japanese Self-Supervised Speech Representation Models},
booktitle={ICASSP 2023 - 2023 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)},
year={2023}
}
```
|