iliasslasri's picture
Update README.md
ccd40c2 verified
metadata
license: mit
language:
  - en
datasets:
  - librispeech_asr
metrics:
  - ued
  - abx
pipeline_tag: automatic-speech-recognition
tags:
  - speech
  - discrete-units
  - quantization
  - hubert
  - dinosr
  - spidr
base_model:
  - facebook/hubert-base-ls960

Robust Speech Quantizer (HuBERT / DinoSR / SpidR)

GitHub Repository

MLP-based robust speech quantizers trained with CTC loss and iterative pseudo-labeling on augmented audio, following Algayres et al., Interspeech 2023. Evaluated on K ∈ {100, 200, 500} vocabulary sizes.

Encoders

Encoder Checkpoint Layer Pre-training data
HuBERT Base hubert-base-ls960 6 LibriSpeech 960h
DinoSR original + SpidR-reproduced 5 LibriSpeech 960h
SpidR spidr-base 6 LibriSpeech 960h

Quick Start

from huggingface_hub import hf_hub_download

model_path = hf_hub_download(
    repo_id="iliasslasri/robust_speech_quantizer",
    filename="500_vocab_size/round_1/E1_best.pt"
)
config_path = hf_hub_download(
    repo_id="iliasslasri/robust_speech_quantizer",
    filename="500_vocab_size/config.yaml"
)

Augmentations

Augmentation Audio
Clean
Time Stretch
Pitch Shift
Reverberation
Noise
Echo
Random Noise
Pink Noise
Lowpass Filter
Highpass Filter
Bandpass Filter
Smooth
Boost Audio
Duck Audio
Up-Down Resample

Links