Update README.md
Browse files
README.md
CHANGED
@@ -13,6 +13,15 @@ tags:
|
|
13 |
|
14 |
# NVIDIA NEST Large En
|
15 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
16 |
The NEST framework is designed for speech self-supervised learning, which can be used as a frozen speech feature extractor or as weight initialization for downstream speech processing tasks. The NEST-L model has about 115M parameters and is trained on an English dataset of roughly 100K hours. <br>
|
17 |
This model is ready for commercial/non-commercial use. <br>
|
18 |
|
@@ -29,9 +38,13 @@ License to use this model is covered by the [CC-BY-4.0](https://creativecommons.
|
|
29 |
|
30 |
## Model Architecture
|
31 |
|
32 |
-
|
|
|
|
|
|
|
|
|
33 |
|
34 |
-
**
|
35 |
- Encoder: FastConformer (18 layers)
|
36 |
- Decoder: Linear classifier
|
37 |
- Masking: Random block masking
|
|
|
13 |
|
14 |
# NVIDIA NEST Large En
|
15 |
|
16 |
+
<style>
|
17 |
+
img {
|
18 |
+
display: inline;
|
19 |
+
}
|
20 |
+
</style>
|
21 |
+
|
22 |
+
[![Model architecture](https://img.shields.io/badge/Model_Arch-FastConformer-lightgrey#model-badge)](#model-architecture)
|
23 |
+
| [![Model size](https://img.shields.io/badge/Params-115M-lightgrey#model-badge)](#model-architecture)
|
24 |
+
|
25 |
The NEST framework is designed for speech self-supervised learning, which can be used as a frozen speech feature extractor or as weight initialization for downstream speech processing tasks. The NEST-L model has about 115M parameters and is trained on an English dataset of roughly 100K hours. <br>
|
26 |
This model is ready for commercial/non-commercial use. <br>
|
27 |
|
|
|
38 |
|
39 |
## Model Architecture
|
40 |
|
41 |
+
The [NEST](https://arxiv.org/abs/2408.13106) framework comprises several building blocks, as illustrated in the left part of the following figure. Once trained, the NEST encoder can be used as weight initialization or feature extractor for downstream speech processing tasks.
|
42 |
+
|
43 |
+
<div align="center">
|
44 |
+
<img src="nest-model.png" width="750" />
|
45 |
+
</div>
|
46 |
|
47 |
+
**Architecture Details:**
|
48 |
- Encoder: FastConformer (18 layers)
|
49 |
- Decoder: Linear classifier
|
50 |
- Masking: Random block masking
|