yangwang825
/

etdnn-vox2

Model card Files Files and versions Community

yangwang825 commited on Dec 22, 2022

Commit

76fcc9f

•

1 Parent(s): e585064

Update README.md

Files changed (1) hide show

README.md +2 -2

README.md CHANGED Viewed

@@ -31,7 +31,7 @@ This repository provides a pretrained E-TDNN model (x-vector) using SpeechBrain.
 This system is composed of an E-TDNN model (x-vector). It is a combination of convolutional and residual blocks. The embeddings are extracted using temporal statistical pooling. The system is trained with Additive Margin Softmax Loss.
-We use FBank (16kHz, 25ms frame length, 10ms hop length, 80 filter-bank channels) as the input features. It was trained using initial learning rate of 0.001 and batch size of 512 with linear scheduler for 30 epochs on 4 A100 GPUs. We employ additive noises and reverberation from [MUSAN](http://www.openslr.org/17/) and [RIR](http://www.openslr.org/28/) datasets to enrich the supervised information. The pre-training progress takes approximately seven days for the E-TDNN model.
 # Performance
@@ -39,7 +39,7 @@ We use FBank (16kHz, 25ms frame length, 10ms hop length, 80 filter-bank channels
 | Splits | Backend | S-norm | EER(%) | minDCF(0.01) |
 |:-------------:|:--------------:|:--------------:|:--------------:|:--------------:|
-| VoxCeleb1-O | cosine | no | 2.27 | 0.21 |
 | VoxCeleb1-E | cosine | no | TBD | TBD |
 | VoxCeleb1-H | cosine | no | TBD | TBD |

 This system is composed of an E-TDNN model (x-vector). It is a combination of convolutional and residual blocks. The embeddings are extracted using temporal statistical pooling. The system is trained with Additive Margin Softmax Loss.
+We use FBank (16kHz, 25ms frame length, 10ms hop length, 80 filter-bank channels) as the input features. It was trained using initial learning rate of 0.001 and batch size of 512 with linear scheduler for 40 epochs on 4 A100 GPUs. We employ additive noises and reverberation from [MUSAN](http://www.openslr.org/17/) and [RIR](http://www.openslr.org/28/) datasets to enrich the supervised information. The pre-training progress takes approximately seven days for the E-TDNN model.
 # Performance
 | Splits | Backend | S-norm | EER(%) | minDCF(0.01) |
 |:-------------:|:--------------:|:--------------:|:--------------:|:--------------:|
+| VoxCeleb1-O | cosine | no | 1.91 | 0.20 |
 | VoxCeleb1-E | cosine | no | TBD | TBD |
 | VoxCeleb1-H | cosine | no | TBD | TBD |