Zeuneuski Audio β€” Basque Dialect Classifier from Speech

5-class Basque dialect classifier (Western, Central, Navarrese, Navarrese-Labourdin, Souletin) using a frozen Whisper large-v3-eu encoder + MLP classifier.

This is the speech counterpart of the zeuneuski text classifier.

Model variants

Variant Macro F1 Trained on Description
whisper_dialect_merged 0.5193 Full merged Ahotsak+Mintzoak (balanced 10K) Baseline β€” mean_std_max pooling, 768-dim MLP
whisper_dialect_aug 0.5342 Full merged + navarrese augmentation Γ—3 Best overall β€” embedding-level augmentation
whisper_dialect_fusion 0.6175 Ahotsak subset (21% with transcriptions) Audio+text fusion (Whisper + fastText logits). Limited to Ahotsak data.

Per-class F1 (best model: whisper_dialect_aug)

Dialect F1
Western 0.70
Central 0.34
Navarrese 0.38
Navarrese-Labourdin 0.83
Souletin 0.42

How it works

  1. Audio (16kHz mono WAV) β†’ Whisper large-v3-eu encoder
  2. Encoder hidden states β†’ mean_std_max pooling β†’ 3840-dim vector
  3. 3840-dim vector β†’ 2-layer MLP (768β†’384β†’5) β†’ dialect probabilities

Requirements

  • GPU with 6+ GB VRAM (runs on CPU too, ~8-10Γ— slower)
  • transformers, torch, numpy, soundfile
  • Whisper model auto-downloaded from xezpeleta/whisper-large-v3-eu

Usage

from src.models.speech.whisper_did import load_speech_model, predict_speech

# Load model (downloads Whisper encoder automatically)
encoder, mlp, label_encoder, scaler, config = load_speech_model(
    model_dir="models/speech/whisper_dialect_aug"
)

# Predict
result = predict_speech("audio.wav", encoder, mlp, label_encoder, scaler, config)
print(result["dialect"], result["confidence"])

Training data

Merged Ahotsak.eus (36K segments, 78h) + Mintzoak.eus (160K segments, 181h). Town-disjoint 80/10/10 train/val/test splits (no town appears in more than one split). Balanced subsampling to 10K per class. 5 classes with 258.9h total audio.

Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model isn't deployed by any Inference Provider. πŸ™‹ Ask for provider support