Update README.md
Browse files
README.md
CHANGED
@@ -29,7 +29,9 @@ ECAPA2 is a hybrid neural network architecture and training strategy for speaker
|
|
29 |
- **Paper [optional]:** [More Information Needed]
|
30 |
- **Demo [optional]:** [More Information Needed]
|
31 |
|
32 |
-
##
|
|
|
|
|
33 |
|
34 |
Extracting speaker embeddings is easy and only requires a few lines of code:
|
35 |
```
|
@@ -41,12 +43,20 @@ ecapa2_model = torch.load('model.pt')
|
|
41 |
embedding = ecapa2_model.extract_embedding(audio)
|
42 |
```
|
43 |
|
|
|
|
|
44 |
For the extraction of other hierachical features, a separate model function is provided:
|
45 |
```
|
46 |
-
feature = ecapa2_model.extract_feature(label='gfe1')
|
47 |
```
|
48 |
|
49 |
-
The
|
|
|
|
|
|
|
|
|
|
|
|
|
50 |
|
51 |
|
52 |
<!-- Address questions around how the model is intended to be used, including the foreseeable users of the model and those affected by the model. -->
|
|
|
29 |
- **Paper [optional]:** [More Information Needed]
|
30 |
- **Demo [optional]:** [More Information Needed]
|
31 |
|
32 |
+
## Usage Guide
|
33 |
+
|
34 |
+
### Speaker Embedding Extraction
|
35 |
|
36 |
Extracting speaker embeddings is easy and only requires a few lines of code:
|
37 |
```
|
|
|
43 |
embedding = ecapa2_model.extract_embedding(audio)
|
44 |
```
|
45 |
|
46 |
+
### Hierarchical Feature Extraction
|
47 |
+
|
48 |
For the extraction of other hierachical features, a separate model function is provided:
|
49 |
```
|
50 |
+
feature = ecapa2_model.extract_feature(label='gfe1', type='mean')
|
51 |
```
|
52 |
|
53 |
+
The following table describes the available features:
|
54 |
+
|
55 |
+
| Feature Type| Description | Usage | Labels |
|
56 |
+
| ----------- | ----------- | ----------- | ----------- |
|
57 |
+
| Local Feature | Non-uniform effective receptive field in the frequency dimension of each frame-level feature.| Abstract features, probably usefull in tasks less related to speaker characteristics. | lfe1, lfe2, lfe3, lfe4
|
58 |
+
| Global Feature | Uniform effective receptive field of each frame-level feature in the frequency dimension.| Generally capture intra-speaker variance better then speaker embeddings. E.g. speaker profiling, emotion recognition. | gfe1, gfe2, gfe3, pool
|
59 |
+
| Speaker Embedding | Uniform effective receptive field of each frame-level feature in the frequency dimension.| Best for tasks directly depending on the speaker identity (as opposed to speaker characteristics). E.g. speaker verification, speaker diarization. | embedding
|
60 |
|
61 |
|
62 |
<!-- Address questions around how the model is intended to be used, including the foreseeable users of the model and those affected by the model. -->
|