Bengali Speaker Segmentation Model (Robust)
Fine-tuned pyannote/segmentation-3.0 model for Bengali speaker diarization.
Training Data
- DISPLACE 2024 (35 files, ~20h)
- DISPLACE 2026 (78 files, ~15h)
- Total: 113 files, ~35 hours
Training Configuration
- Base model: pyannote/segmentation-3.0
- Epochs: 26 (early stopping from 30)
- Learning rate: OneCycleLR, max_lr=5e-5
- Label smoothing: 0.1
- Batch size: 32
- Gradient clipping: max_norm=1.0
- Samples per file: 50 (training), 20 (validation)
Results
- Best validation loss: 1.2473
- Best validation accuracy: 55.46%
- Final train accuracy: 60.83%
Usage
from pyannote.audio import Model
# Load model
model = Model.from_pretrained("smam/pyannote-segmentation-bengali-displace")
# Or load weights manually
import torch
state_dict = torch.load("pytorch_model.bin")
model.load_state_dict(state_dict)
Architecture
- SincNet frontend
- Bidirectional LSTM
- Powerset classification (7 classes for 3 speakers, max 2 simultaneous)
- Downloads last month
- -
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support
Model tree for smam/pyannote-segmentation-bengali-displace
Base model
pyannote/segmentation-3.0