You need to agree to share your contact information to access this model

This repository is publicly accessible, but you have to accept the conditions to access its files and content.

Log in or Sign Up to review the conditions and access this model content.

Bengali Speaker Segmentation Model (Robust)

Fine-tuned pyannote/segmentation-3.0 model for Bengali speaker diarization.

Training Data

  • DISPLACE 2024 (35 files, ~20h)
  • DISPLACE 2026 (78 files, ~15h)
  • Total: 113 files, ~35 hours

Training Configuration

  • Base model: pyannote/segmentation-3.0
  • Epochs: 26 (early stopping from 30)
  • Learning rate: OneCycleLR, max_lr=5e-5
  • Label smoothing: 0.1
  • Batch size: 32
  • Gradient clipping: max_norm=1.0
  • Samples per file: 50 (training), 20 (validation)

Results

  • Best validation loss: 1.2473
  • Best validation accuracy: 55.46%
  • Final train accuracy: 60.83%

Usage

from pyannote.audio import Model

# Load model
model = Model.from_pretrained("smam/pyannote-segmentation-bengali-displace")

# Or load weights manually
import torch
state_dict = torch.load("pytorch_model.bin")
model.load_state_dict(state_dict)

Architecture

  • SincNet frontend
  • Bidirectional LSTM
  • Powerset classification (7 classes for 3 speakers, max 2 simultaneous)
Downloads last month
-
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for smam/pyannote-segmentation-bengali-displace

Finetuned
(95)
this model