steveheh commited on
Commit
0258c09
1 Parent(s): a969af3

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +15 -2
README.md CHANGED
@@ -13,6 +13,15 @@ tags:
13
 
14
  # NVIDIA NEST Large En
15
 
 
 
 
 
 
 
 
 
 
16
  The NEST framework is designed for speech self-supervised learning, which can be used as a frozen speech feature extractor or as weight initialization for downstream speech processing tasks. The NEST-L model has about 115M parameters and is trained on an English dataset of roughly 100K hours. <br>
17
  This model is ready for commercial/non-commercial use. <br>
18
 
@@ -29,9 +38,13 @@ License to use this model is covered by the [CC-BY-4.0](https://creativecommons.
29
 
30
  ## Model Architecture
31
 
32
- **Architecture Type:** NEST [1] <br>
 
 
 
 
33
 
34
- **Network Architecture:**
35
  - Encoder: FastConformer (18 layers)
36
  - Decoder: Linear classifier
37
  - Masking: Random block masking
 
13
 
14
  # NVIDIA NEST Large En
15
 
16
+ <style>
17
+ img {
18
+ display: inline;
19
+ }
20
+ </style>
21
+
22
+ [![Model architecture](https://img.shields.io/badge/Model_Arch-FastConformer-lightgrey#model-badge)](#model-architecture)
23
+ | [![Model size](https://img.shields.io/badge/Params-115M-lightgrey#model-badge)](#model-architecture)
24
+
25
  The NEST framework is designed for speech self-supervised learning, which can be used as a frozen speech feature extractor or as weight initialization for downstream speech processing tasks. The NEST-L model has about 115M parameters and is trained on an English dataset of roughly 100K hours. <br>
26
  This model is ready for commercial/non-commercial use. <br>
27
 
 
38
 
39
  ## Model Architecture
40
 
41
+ The [NEST](https://arxiv.org/abs/2408.13106) framework comprises several building blocks, as illustrated in the left part of the following figure. Once trained, the NEST encoder can be used as weight initialization or feature extractor for downstream speech processing tasks.
42
+
43
+ <div align="center">
44
+ <img src="nest-model.png" width="750" />
45
+ </div>
46
 
47
+ **Architecture Details:**
48
  - Encoder: FastConformer (18 layers)
49
  - Decoder: Linear classifier
50
  - Masking: Random block masking