okuchaiev commited on
Commit
baefef7
1 Parent(s): bb3cc84

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +72 -0
README.md CHANGED
@@ -1,3 +1,75 @@
1
  ---
 
 
 
 
 
 
 
 
 
2
  license: cc-by-4.0
 
3
  ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
  ---
2
+ language:
3
+ - en
4
+ library_name: nemo
5
+ datasets:
6
+ - the_pile
7
+ tags:
8
+ - text generation
9
+ - pytorch
10
+ - causal-lm
11
  license: cc-by-4.0
12
+
13
  ---
14
+
15
+ <style>
16
+ img {
17
+ display: inline;
18
+ }
19
+ </style>
20
+
21
+ | [![Model architecture](https://img.shields.io/badge/Model%20Arch-Transformer%20Decoder-green)](#model-architecture)
22
+ | [![Model size](https://img.shields.io/badge/Params-1.3B-green)](#model-architecture)
23
+ | [![Language](https://img.shields.io/badge/Language-en--US-lightgrey#model-badge)](#datasets)
24
+
25
+
26
+ # Megatron-GPT 1.3B
27
+
28
+ ## Model Description
29
+
30
+ Megatron-GPT 1.3B is a transformer-based language model. GPT refers to a class of transformer decoder-only models similar to GPT-2 and 3 while 1.3B refers to the total trainable parameter count (1.3 Billion) [1, 2].
31
+
32
+ This model was trained with [NeMo Megatron](https://docs.nvidia.com/deeplearning/nemo/user-guide/docs/en/stable/nlp/nemo_megatron/intro.html).
33
+
34
+ ## Getting started
35
+
36
+ You will need to install NVIDIA Apex and NeMo.
37
+
38
+ ```
39
+ git clone https://github.com/ericharper/apex.git
40
+ cd apex
41
+ git checkout nm_v1.11.0
42
+ pip install -v --disable-pip-version-check --no-cache-dir --global-option="--cpp_ext" --global-option="--cuda_ext" --global-option="--fast_layer_norm" --global-option="--distributed_adam" --global-option="--deprecated_fused_adam" ./
43
+ ```
44
+
45
+ ```
46
+ pip install nemo_toolkit['nlp']==1.11.0
47
+ ```
48
+
49
+ Alternatively, you can use NeMo Megatron training docker container with all dependencies pre-installed.
50
+
51
+ ## Training Data
52
+
53
+ The model was trained on ["The Piles" dataset prepared by Eleuther.AI](https://pile.eleuther.ai/).
54
+
55
+ ## Evaluation results
56
+
57
+ *Zero-shot performance.*
58
+
59
+ | ARC-Challenge | ARC-Easy | RACE-middle | RACE-high | Winogrande | RTE | BoolQA | HellaSwag | PiQA |
60
+ | ------------- | -------- | ----------- | --------- | ---------- | --- | ------ | --------- | ---- |
61
+ | 0.3012 | 0.4596 | 0.459 | 0.3811 | 0.5343 | 0.5451 | 0.5979 | 0.4442 | 0.6834 |
62
+
63
+
64
+
65
+ ## References
66
+
67
+ [1] [Improving Language Understanding by Generative Pre-Training](https://s3-us-west-2.amazonaws.com/openai-assets/research-covers/language-unsupervised/language_understanding_paper.pdf)
68
+
69
+ [2] [Megatron-LM: Training Multi-Billion Parameter Language Models Using Model Parallelism](https://arxiv.org/pdf/1909.08053.pdf)
70
+
71
+ [3] [NVIDIA NeMo Toolkit](https://github.com/NVIDIA/NeMo)
72
+
73
+ ## Licence
74
+
75
+ License to use this model is covered by the [CC-BY-4.0](https://creativecommons.org/licenses/by/4.0/). By downloading the public and release version of the model, you accept the terms and conditions of the [CC-BY-4.0](https://creativecommons.org/licenses/by/4.0/) license.