Jenthe commited on
Commit
0e15756
1 Parent(s): 487a014

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +13 -3
README.md CHANGED
@@ -29,7 +29,9 @@ ECAPA2 is a hybrid neural network architecture and training strategy for speaker
29
  - **Paper [optional]:** [More Information Needed]
30
  - **Demo [optional]:** [More Information Needed]
31
 
32
- ## How-to-use
 
 
33
 
34
  Extracting speaker embeddings is easy and only requires a few lines of code:
35
  ```
@@ -41,12 +43,20 @@ ecapa2_model = torch.load('model.pt')
41
  embedding = ecapa2_model.extract_embedding(audio)
42
  ```
43
 
 
 
44
  For the extraction of other hierachical features, a separate model function is provided:
45
  ```
46
- feature = ecapa2_model.extract_feature(label='gfe1')
47
  ```
48
 
49
- The list of available labels exists of: 'lfe1', 'lfe2', 'lfe3', 'lfe4', 'gfe1', 'gfe2', 'pool' and 'embedding' (equal to model.extract_embedding()).
 
 
 
 
 
 
50
 
51
 
52
  <!-- Address questions around how the model is intended to be used, including the foreseeable users of the model and those affected by the model. -->
 
29
  - **Paper [optional]:** [More Information Needed]
30
  - **Demo [optional]:** [More Information Needed]
31
 
32
+ ## Usage Guide
33
+
34
+ ### Speaker Embedding Extraction
35
 
36
  Extracting speaker embeddings is easy and only requires a few lines of code:
37
  ```
 
43
  embedding = ecapa2_model.extract_embedding(audio)
44
  ```
45
 
46
+ ### Hierarchical Feature Extraction
47
+
48
  For the extraction of other hierachical features, a separate model function is provided:
49
  ```
50
+ feature = ecapa2_model.extract_feature(label='gfe1', type='mean')
51
  ```
52
 
53
+ The following table describes the available features:
54
+
55
+ | Feature Type| Description | Usage | Labels |
56
+ | ----------- | ----------- | ----------- | ----------- |
57
+ | Local Feature | Non-uniform effective receptive field in the frequency dimension of each frame-level feature.| Abstract features, probably usefull in tasks less related to speaker characteristics. | lfe1, lfe2, lfe3, lfe4
58
+ | Global Feature | Uniform effective receptive field of each frame-level feature in the frequency dimension.| Generally capture intra-speaker variance better then speaker embeddings. E.g. speaker profiling, emotion recognition. | gfe1, gfe2, gfe3, pool
59
+ | Speaker Embedding | Uniform effective receptive field of each frame-level feature in the frequency dimension.| Best for tasks directly depending on the speaker identity (as opposed to speaker characteristics). E.g. speaker verification, speaker diarization. | embedding
60
 
61
 
62
  <!-- Address questions around how the model is intended to be used, including the foreseeable users of the model and those affected by the model. -->