You need to agree to share your contact information to access this model

This repository is publicly accessible, but you have to accept the conditions to access its files and content.

Bengali Speaker Segmentation Model (Robust)

Fine-tuned pyannote/segmentation-3.0 model for Bengali speaker diarization.

Training Data

DISPLACE 2024 (35 files, ~20h)
DISPLACE 2026 (78 files, ~15h)
Total: 113 files, ~35 hours

Training Configuration

Base model: pyannote/segmentation-3.0
Epochs: 26 (early stopping from 30)
Learning rate: OneCycleLR, max_lr=5e-5
Label smoothing: 0.1
Batch size: 32
Gradient clipping: max_norm=1.0
Samples per file: 50 (training), 20 (validation)

Results

Best validation loss: 1.2473
Best validation accuracy: 55.46%
Final train accuracy: 60.83%

Usage

from pyannote.audio import Model

# Load model
model = Model.from_pretrained("smam/pyannote-segmentation-bengali-displace")

# Or load weights manually
import torch
state_dict = torch.load("pytorch_model.bin")
model.load_state_dict(state_dict)

Architecture

SincNet frontend
Bidirectional LSTM
Powerset classification (7 classes for 3 speakers, max 2 simultaneous)

Downloads last month: -

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for smam/pyannote-segmentation-bengali-displace

Base model

pyannote/segmentation-3.0

Finetuned

(95)

this model