Rifat Mamayusupov commited on
Commit
47c5104
1 Parent(s): 71b7dc7

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +1 -66
README.md CHANGED
@@ -8,60 +8,15 @@ tags:
8
 
9
  # Speaker-verification-v2
10
 
11
- <style>
12
- img {
13
- display: inline;
14
- }
15
- </style>
16
-
17
- [![Model architecture](https://img.shields.io/badge/Model_Arch-PUT-YOUR-ARCHITECTURE-HERE-lightgrey#model-badge)](#model-architecture)
18
- | [![Model size](https://img.shields.io/badge/Params-PUT-YOUR-MODEL-SIZE-HERE-lightgrey#model-badge)](#model-architecture)
19
- | [![Language](https://img.shields.io/badge/Language-PUT-YOUR-LANGUAGE-HERE-lightgrey#model-badge)](#datasets)
20
-
21
- **Put a short model description here.**
22
-
23
- See the [model architecture](#model-architecture) section and [NeMo documentation](https://docs.nvidia.com/deeplearning/nemo/user-guide/docs/en/stable/index.html) for complete architecture details.
24
-
25
-
26
- ## NVIDIA NeMo: Training
27
-
28
- To train, fine-tune or play with the model you will need to install [NVIDIA NeMo](https://github.com/NVIDIA/NeMo). We recommend you install it after you've installed latest Pytorch version.
29
- ```
30
- pip install nemo_toolkit['all']
31
- ```
32
 
33
  ## How to Use this Model
34
 
35
- The model is available for use in the NeMo toolkit [3], and can be used as a pre-trained checkpoint for inference or for fine-tuning on another dataset.
36
 
37
  ### Automatically instantiate the model
38
 
39
  **NOTE**: Please update the model class below to match the class of the model being uploaded.
40
 
41
- ```python
42
- import nemo.core import ModelPT
43
- model = ModelPT.from_pretrained("ai-nightcoder/speaker-verification-v2")
44
- ```
45
-
46
- ### NOTE
47
-
48
- Add some information about how to use the model here. An example is provided for ASR inference below.
49
-
50
- ### Transcribing using Python
51
- First, let's get a sample
52
- ```
53
- wget https://dldata-public.s3.us-east-2.amazonaws.com/2086-149220-0033.wav
54
- ```
55
- Then simply do:
56
- ```
57
- asr_model.transcribe(['2086-149220-0033.wav'])
58
- ```
59
-
60
- ### Transcribing many audio files
61
-
62
- ```shell
63
- python [NEMO_GIT_FOLDER]/examples/asr/transcribe_speech.py pretrained_name="ai-nightcoder/speaker-verification-v2" audio_dir=""
64
- ```
65
 
66
  ### Input
67
 
@@ -79,14 +34,6 @@ model = ModelPT.from_pretrained("ai-nightcoder/speaker-verification-v2")
79
 
80
  **Add information here about how the model was trained. It should be as detailed as possible, potentially including the the link to the script used to train as well as the base config used to train the model. If extraneous scripts are used to prepare the components of the model, please include them here.**
81
 
82
- ### NOTE
83
-
84
- An example is provided below for ASR
85
-
86
- The NeMo toolkit [3] was used for training the models for over several hundred epochs. These model are trained with this [example script](https://github.com/NVIDIA/NeMo/blob/main/examples/asr/asr_transducer/speech_to_text_rnnt_bpe.py) and this [base config](https://github.com/NVIDIA/NeMo/blob/main/examples/asr/conf/fastconformer/fast-conformer_transducer_bpe.yaml).
87
-
88
- The tokenizers for these models were built using the text transcripts of the train set with this [script](https://github.com/NVIDIA/NeMo/blob/main/scripts/tokenizers/process_asr_text_tokenizer.py).
89
-
90
 
91
  ### Datasets
92
 
@@ -113,8 +60,6 @@ model = ModelPT.from_pretrained("ai-nightcoder/speaker-verification-v2")
113
 
114
  The corresponding text in this section for those datasets is stated below -
115
 
116
- The model was trained on 64K hours of English speech collected and prepared by NVIDIA NeMo and Suno teams.
117
-
118
  The training dataset consists of private subset with 40K hours of English speech plus 24K hours from the following public datasets:
119
 
120
  - Librispeech 960 hours of English speech
@@ -171,7 +116,6 @@ model = ModelPT.from_pretrained("ai-nightcoder/speaker-verification-v2")
171
 
172
  Provide any caveats about the results presented in the top of the discussion so that nuance is not lost.
173
 
174
- It should ideally be in a tabular format (you can use the following website to make your tables in markdown format - https://www.tablesgenerator.com/markdown_tables)**
175
 
176
  ## Limitations
177
 
@@ -185,12 +129,3 @@ It should ideally be in a tabular format (you can use the following website to m
185
  Since this model was trained on publicly available speech datasets, the performance of this model might degrade for speech which includes technical terms, or vernacular that the model has not been trained on. The model might also perform worse for accented speech.
186
 
187
 
188
- ## License
189
-
190
- License to use this model is covered by the [CC-BY-4.0](https://creativecommons.org/licenses/by/4.0/). By downloading the public and release version of the model, you accept the terms and conditions of the [CC-BY-4.0](https://creativecommons.org/licenses/by/4.0/) license.
191
-
192
- ## References
193
-
194
- **Provide appropriate references in the markdown link format below. Please order them numerically.**
195
-
196
- [1] [NVIDIA NeMo Toolkit](https://github.com/NVIDIA/NeMo)
 
8
 
9
  # Speaker-verification-v2
10
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
11
 
12
  ## How to Use this Model
13
 
14
+ The model is available for use in the and can be used as a pre-trained checkpoint for inference or for fine-tuning on another dataset.
15
 
16
  ### Automatically instantiate the model
17
 
18
  **NOTE**: Please update the model class below to match the class of the model being uploaded.
19
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
20
 
21
  ### Input
22
 
 
34
 
35
  **Add information here about how the model was trained. It should be as detailed as possible, potentially including the the link to the script used to train as well as the base config used to train the model. If extraneous scripts are used to prepare the components of the model, please include them here.**
36
 
 
 
 
 
 
 
 
 
37
 
38
  ### Datasets
39
 
 
60
 
61
  The corresponding text in this section for those datasets is stated below -
62
 
 
 
63
  The training dataset consists of private subset with 40K hours of English speech plus 24K hours from the following public datasets:
64
 
65
  - Librispeech 960 hours of English speech
 
116
 
117
  Provide any caveats about the results presented in the top of the discussion so that nuance is not lost.
118
 
 
119
 
120
  ## Limitations
121
 
 
129
  Since this model was trained on publicly available speech datasets, the performance of this model might degrade for speech which includes technical terms, or vernacular that the model has not been trained on. The model might also perform worse for accented speech.
130
 
131