okuchaiev commited on
Commit
d1b529c
1 Parent(s): 96ac9a1

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +6 -6
README.md CHANGED
@@ -24,7 +24,7 @@ img {
24
 
25
  ## Model Description
26
 
27
- Megatron-GPT 1.3B is a transformer-based language model. GPT refers to a class of transformer decoder-only models similar to GPT-2 and 3 while 1.3B refers to the total trainable parameter count (1.3 Billion) [1, 2].
28
 
29
  This model was trained with [NeMo Megatron](https://docs.nvidia.com/deeplearning/nemo/user-guide/docs/en/stable/nlp/nemo_megatron/intro.html).
30
 
@@ -95,17 +95,15 @@ print(sentences)
95
 
96
  ## Training Data
97
 
98
- The model was trained on ["The Piles" dataset prepared by Eleuther.AI](https://pile.eleuther.ai/).
99
 
100
  ## Evaluation results
101
 
102
- *Zero-shot performance.*
103
 
104
  | ARC-Challenge | ARC-Easy | RACE-middle | RACE-high | Winogrande | RTE | BoolQA | HellaSwag | PiQA |
105
  | ------------- | -------- | ----------- | --------- | ---------- | --- | ------ | --------- | ---- |
106
- | 0.3012 | 0.4596 | 0.459 | 0.3811 | 0.5343 | 0.5451 | 0.5979 | 0.4442 | 0.6834 |
107
-
108
-
109
 
110
  ## References
111
 
@@ -115,6 +113,8 @@ The model was trained on ["The Piles" dataset prepared by Eleuther.AI](https://p
115
 
116
  [3] [NVIDIA NeMo Toolkit](https://github.com/NVIDIA/NeMo)
117
 
 
 
118
  ## Licence
119
 
120
  License to use this model is covered by the [CC-BY-4.0](https://creativecommons.org/licenses/by/4.0/). By downloading the public and release version of the model, you accept the terms and conditions of the [CC-BY-4.0](https://creativecommons.org/licenses/by/4.0/) license.
 
24
 
25
  ## Model Description
26
 
27
+ Megatron-GPT 1.3B is a transformer-based language model. GPT refers to a class of transformer decoder-only models similar to GPT-2 and 3 while 1.3B refers to the total trainable parameter count (1.3 Billion) [1, 2]. It has Tensor Parallelism (TP) of 1, Pipeline Parallelism (PP) of 1 and should fit on a single NVIDIA GPU.
28
 
29
  This model was trained with [NeMo Megatron](https://docs.nvidia.com/deeplearning/nemo/user-guide/docs/en/stable/nlp/nemo_megatron/intro.html).
30
 
 
95
 
96
  ## Training Data
97
 
98
+ The model was trained on ["The Piles" dataset prepared by Eleuther.AI](https://pile.eleuther.ai/). [4]
99
 
100
  ## Evaluation results
101
 
102
+ *Zero-shot performance.* Evaluated using [LM Evaluation Test Suite from AI21](https://github.com/AI21Labs/lm-evaluation)
103
 
104
  | ARC-Challenge | ARC-Easy | RACE-middle | RACE-high | Winogrande | RTE | BoolQA | HellaSwag | PiQA |
105
  | ------------- | -------- | ----------- | --------- | ---------- | --- | ------ | --------- | ---- |
106
+ | 0.3012 | 0.4596 | 0.459 | 0.3797 | 0.5343 | 0.5451 | 0.5979 | 0.4443 | 0.6834 |
 
 
107
 
108
  ## References
109
 
 
113
 
114
  [3] [NVIDIA NeMo Toolkit](https://github.com/NVIDIA/NeMo)
115
 
116
+ [4] [The Pile: An 800GB Dataset of Diverse Text for Language Modeling](https://arxiv.org/abs/2101.00027)
117
+
118
  ## Licence
119
 
120
  License to use this model is covered by the [CC-BY-4.0](https://creativecommons.org/licenses/by/4.0/). By downloading the public and release version of the model, you accept the terms and conditions of the [CC-BY-4.0](https://creativecommons.org/licenses/by/4.0/) license.