mechanicalsea commited on
Commit
3259f6e
1 Parent(s): 675a69a

update README with inference

Browse files
Files changed (1) hide show
  1. README.md +46 -6
README.md CHANGED
@@ -22,7 +22,47 @@ metrics:
22
 
23
  # EfficientTDNN
24
 
25
- Model Version are listed as follows.
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
26
 
27
  - **Dynamic Kernel**: The model enables various kernel sizes in {1,3,5}, `kernel/kernel.torchparams`.
28
  - **Dynamic Depth**: The model enables additional various depth in {2,3,4} based on **Dynamic Kernel** version, `depth/depth.torchparams`.
@@ -59,10 +99,10 @@ Furthermore, some subnets are given in the form of the weights of batchnorm corr
59
 
60
  The tag is described as follows.
61
 
62
- - max: `(4, [512, 512, 512, 512, 512], [5, 5, 5, 5, 5], 1536)`
63
- - Kmin: `(4, [512, 512, 512, 512, 512], [1, 1, 1, 1, 1], 1536)`
64
- - Dmin: `(2, [512, 512, 512], [1, 1, 1], 1536)`
65
- - C1min: `(2, [256, 256, 256], [1, 1, 1], 768)`
66
- - C2min: `(2, [128, 128, 128], [1, 1, 1], 384)`
67
 
68
  More details about EfficentTDNN can be found in the paper [EfficientTDNN](https://arxiv.org/abs/2103.13581).
 
22
 
23
  # EfficientTDNN
24
 
25
+ This repository provides all the necessary tools to perform speaker verification with a NAS alternative, named as EfficientTDNN.
26
+ The system can be used to extract speaker embeddings with different model size.
27
+ It is trained on Voxceleb2 training data using data augmentation.
28
+ The model performance on Voxceleb1-test set(Cleaned)/Vox1-O are reported as follows.
29
+
30
+ | Supernet Stage | Subnet | MACs (3-second) | Params | EER(%) w/ AS-Norm | EER(%) w/o AS-Norm | minDCF w/ AS-Norm | minDCF w/o AS-Norm |
31
+ |:-------------:|:--------------:|:--------------:|:--------------:|:--------------:|:--------------:|:--------------:|:--------------:|
32
+ | depth | Base | 1.45G | 5.79M | 0.94 | 1.14 | 0.089 | 0.106 |
33
+ | width 1 | Mobile | 570.98M | 2.42M | 1.41 | 1.61 | 0.124 | 0.152 |
34
+ | width 2 | Small | 204.07M | 899.20K | 2.20 | 2.33 | 0.219 | 0.241 |
35
+
36
+ The details of three subnets are:
37
+
38
+ - Base: (3, [512, 512, 512, 512], [5, 3, 3, 3], 1536)
39
+ - Mobile: (3, [384, 256, 256, 256], [5, 3, 3, 3], 768)
40
+ - Small: (2, [256, 256, 256], [3, 3, 3], 400)
41
+
42
+ ## Compute your speaker embeddings
43
+
44
+ ```python
45
+ import torchaudio
46
+ from sugar.models import WrappedModel
47
+ wav_file = f"{vox1_root}/id10270/x6uYqmx31kE/00001.wav"
48
+ signal, fs =torchaudio.load(wav_file)
49
+
50
+ repo_id = "mechanicalsea/efficient-tdnn"
51
+ supernet_filename = "depth/depth.torchparams"
52
+ subnet_filename = "depth/depth.ecapa-tdnn.3.512.512.512.512.5.3.3.3.1536.bn.tar"
53
+ subnet, info = WrappedModel.from_pretrained(
54
+ repo_id=repo_id, supernet_filename=supernet_filename, subnet_filename=subnet_filename)
55
+
56
+ embedding = subnet(signal)
57
+ ```
58
+
59
+ ## Inference on GPU
60
+
61
+ To perform inference on the GPU, add `subnet = subnet.to(device)` after calling the `from_pretrained` method.
62
+
63
+ ## Model Description
64
+
65
+ Models are listed as follows.
66
 
67
  - **Dynamic Kernel**: The model enables various kernel sizes in {1,3,5}, `kernel/kernel.torchparams`.
68
  - **Dynamic Depth**: The model enables additional various depth in {2,3,4} based on **Dynamic Kernel** version, `depth/depth.torchparams`.
 
99
 
100
  The tag is described as follows.
101
 
102
+ - max: (4, [512, 512, 512, 512, 512], [5, 5, 5, 5, 5], 1536)
103
+ - Kmin: (4, [512, 512, 512, 512, 512], [1, 1, 1, 1, 1], 1536)
104
+ - Dmin: (2, [512, 512, 512], [1, 1, 1], 1536)
105
+ - C1min: (2, [256, 256, 256], [1, 1, 1], 768)
106
+ - C2min: (2, [128, 128, 128], [1, 1, 1], 384)
107
 
108
  More details about EfficentTDNN can be found in the paper [EfficientTDNN](https://arxiv.org/abs/2103.13581).