|
--- |
|
license: apache-2.0 |
|
language: |
|
- en |
|
library_name: fairseq |
|
pipeline_tag: automatic-speech-recognition |
|
inference: false |
|
--- |
|
|
|
|
|
<br> |
|
<br> |
|
|
|
# ARMHuBERT Model Card |
|
|
|
This repo contains the models from our paper [**Recycle-and-Distill: Universal Compression Strategy for Transformer-based Speech SSL Models with Attention Map Reusing and Masking Distillation**](https://arxiv.org/abs/2305.11685), INTERSPEECH 2023. |
|
|
|
|
|
## Model details |
|
|
|
**Model type:** |
|
ARMHuBERT is an open-source speech SSL model distilled from HuBERT-Base, by attention map reusing and masking distillation. |
|
We also provide the model checkpoints of MaskHuBERT (without attention map reusing) and ARMwavLM (wavLM-Base teacher). |
|
|
|
- Attention Map Reusing: Reuse previous layer's attention map to remove key & query parameters in Transformer. |
|
- Masking Distillation: Masking distillation treating masked frames and unmasked frames separately. |
|
|
|
**License:** |
|
Apache 2.0 License |
|
|
|
**Where to send questions or comments about the model:** |
|
https://github.com/sungnyun/ARMHuBERT/issues |
|
|
|
|
|
## Training dataset |
|
Pretraining data: [LibriSpeech](https://www.openslr.org/12) |
|
- ``[ModelName]-100h.ckpt``: train-clean-100 |
|
- ``[ModelName]-960h.ckpt``: train-clean-100 + train-clean-360 + train-other-500 |
|
|
|
|
|
<br> |
|
|
|
More detials are in our github, https://github.com/sungnyun/ARMHuBERT. |