jesus-villalba's picture
added results v-o, v-h
1242d1e
metadata
language:
  - en
license: apache-2.0
tags:
  - hyperion
  - audio
  - speech
  - speaker-recognition
  - x-vector
  - thin-resnet34
datasets:
  - voxceleb
metrics:
  - eer
  - min_dcf-p=0.05
  - min_dcf-p=0.01
model-index:
  - name: >-
      voxceleb-v1.1-fbank80_stmn_lresnet34_e256_arcs30m0.3_do0_adam_lr0.05_b512.v1
    results:
      - task:
          type: speaker-verification
          name: Speaker Verification
        dataset:
          type: voxceleb1
          name: Voxceleb1
          args: Train on VoxCeleb2-dev
        metrics:
          - type: eer
            value: 2.11
            name: EER Vox1-O
          - type: min_dcf-p=0.05
            value: 0.135
            name: Minimum DCF Vox1-O prior=0.05
          - type: act_dcf-p=0.01
            value: 0.208
            name: Minimum DCF Vox1-O prior=0.01
          - type: eer
            value: 1.93
            name: EER Vox1-E
          - type: min_dcf-p=0.05
            value: 0.121
            name: Minimum DCF Vox1-E prior=0.05
          - type: act_dcf-p=0.01
            value: 0.204
            name: Minimum DCF Vox1-E Original prior=0.01
          - type: eer
            value: 3.21
            name: EER Vox1-H
          - type: min_dcf-p=0.05
            value: 0.19
            name: Minimum DCF Vox1-H prior=0.05
          - type: act_dcf-p=0.01
            value: 0.298
            name: Minimum DCF Vox1-H Original prior=0.01

Hyperion Toolkit Speaker Verification pre-trained Model

Model Configuration

This model was trained using recipe voxceleb/v1.1

The configuration for this modeis is defined in config_fbank80_stmn_lresnet34_arcs30m0.3_adam_lr0.05_amp.v1.sh

This is an x-vector model with:

  • 80 logMel filter-banks with short-time mean normalization.
  • ThinResNet34 (aka Light ResNet34) encoder.
  • Mean+Stddev pooling
  • AAM-softmax loss (m=0.3, s=30)
  • Mixed prec. training.