metadata
license: mit
language:
- en
datasets:
- librispeech_asr
metrics:
- ued
- abx
pipeline_tag: automatic-speech-recognition
tags:
- speech
- discrete-units
- quantization
- hubert
- dinosr
- spidr
base_model:
- facebook/hubert-base-ls960
Robust Speech Quantizer (HuBERT / DinoSR / SpidR)
MLP-based robust speech quantizers trained with CTC loss and iterative pseudo-labeling on augmented audio, following Algayres et al., Interspeech 2023. Evaluated on K ∈ {100, 200, 500} vocabulary sizes.
Encoders
| Encoder | Checkpoint | Layer | Pre-training data |
|---|---|---|---|
| HuBERT Base | hubert-base-ls960 |
6 | LibriSpeech 960h |
| DinoSR | original + SpidR-reproduced | 5 | LibriSpeech 960h |
| SpidR | spidr-base |
6 | LibriSpeech 960h |
Quick Start
from huggingface_hub import hf_hub_download
model_path = hf_hub_download(
repo_id="iliasslasri/robust_speech_quantizer",
filename="500_vocab_size/round_1/E1_best.pt"
)
config_path = hf_hub_download(
repo_id="iliasslasri/robust_speech_quantizer",
filename="500_vocab_size/config.yaml"
)
Augmentations
| Augmentation | Audio |
|---|---|
| Clean | |
| Time Stretch | |
| Pitch Shift | |
| Reverberation | |
| Noise | |
| Echo | |
| Random Noise | |
| Pink Noise | |
| Lowpass Filter | |
| Highpass Filter | |
| Bandpass Filter | |
| Smooth | |
| Boost Audio | |
| Duck Audio | |
| Up-Down Resample |
Links
- Paper: Algayres et al., Interspeech 2023
- Code: GitHub