mechanicalsea commited on
Commit
391404d
1 Parent(s): 049a0c7

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +15 -16
README.md CHANGED
@@ -42,18 +42,18 @@ The details of three subnets are:
42
  ## Compute your speaker embeddings
43
 
44
  ```python
45
- import torchaudio
46
  from sugar.models import WrappedModel
47
- wav_file = f"{vox1_root}/id10270/x6uYqmx31kE/00001.wav"
48
- signal, fs =torchaudio.load(wav_file)
49
 
50
  repo_id = "mechanicalsea/efficient-tdnn"
51
  supernet_filename = "depth/depth.torchparams"
52
  subnet_filename = "depth/depth.ecapa-tdnn.3.512.512.512.512.5.3.3.3.1536.bn.tar"
53
- subnet, info = WrappedModel.from_pretrained(
54
- repo_id=repo_id, supernet_filename=supernet_filename, subnet_filename=subnet_filename)
 
55
 
56
- embedding = subnet(signal)
57
  ```
58
 
59
  ## Inference on GPU
@@ -112,14 +112,13 @@ More details about EfficentTDNN can be found in the paper [EfficientTDNN](https:
112
  Please, cite EfficientTDNN if you use it for your research or business.
113
 
114
  ```bibtex
115
- @article{rwang-efficienttdnn-2021,
116
- title={{EfficientTDNN}: Efficient Architecture Search for Speaker Recognition},
117
- author={Rui Wang and Zhihua Wei and Haoran Duan and Shouling Ji and Yang Long and Zhen Hong},
118
- journal={arXiv preprint arXiv:2103.13581},
119
- year={2021},
120
- eprint={2103.13581},
121
- archivePrefix={arXiv},
122
- primaryClass={eess.AS},
123
- note={arXiv:2103.13581}
124
- }
125
  ```
 
42
  ## Compute your speaker embeddings
43
 
44
  ```python
45
+ import torch
46
  from sugar.models import WrappedModel
47
+ wav_input_16khz = torch.randn(1,10000).cuda()
 
48
 
49
  repo_id = "mechanicalsea/efficient-tdnn"
50
  supernet_filename = "depth/depth.torchparams"
51
  subnet_filename = "depth/depth.ecapa-tdnn.3.512.512.512.512.5.3.3.3.1536.bn.tar"
52
+ subnet, info = WrappedModel.from_pretrained(repo_id=repo_id, supernet_filename=supernet_filename, subnet_filename=subnet_filename)
53
+ subnet = subnet.cuda()
54
+ subnet = subnet.eval()
55
 
56
+ embedding = subnet(wav_input_16khz)
57
  ```
58
 
59
  ## Inference on GPU
 
112
  Please, cite EfficientTDNN if you use it for your research or business.
113
 
114
  ```bibtex
115
+ @article{wr-efficienttdnn-2022,
116
+ author={Wang, Rui and Wei, Zhihua and Duan, Haoran and Ji, Shouling and Long, Yang and Hong, Zhen},
117
+ journal={IEEE/ACM Transactions on Audio, Speech, and Language Processing},
118
+ title={EfficientTDNN: Efficient Architecture Search for Speaker Recognition},
119
+ year={2022},
120
+ volume={30},
121
+ number={},
122
+ pages={2267-2279},
123
+ doi={10.1109/TASLP.2022.3182856}}
 
124
  ```