J38 commited on
Commit
fea58cd
1 Parent(s): 2e9d8e5

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +3 -24
README.md CHANGED
@@ -15,7 +15,6 @@ This model was a joint collaboration of [Stanford CRFM](https://crfm.stanford.ed
15
  - [Model Details](#model-details)
16
  - [Model Description](#model-description)
17
  - [Uses](#uses)
18
- - [Direct Use](#direct-use)
19
  - [Downstream Use](#downstream-use)
20
  - [Out-of-Scope Use](#out-of-scope-use)
21
  - [Bias, Risks, and Limitations](#bias-risks-and-limitations)
@@ -24,23 +23,11 @@ This model was a joint collaboration of [Stanford CRFM](https://crfm.stanford.ed
24
  - [Training Data](#training-data)
25
  - [Training Procedure](#training-procedure)
26
  - [Preprocessing](#preprocessing)
27
- - [Speeds, Sizes, Times](#speeds-sizes-times)
28
- - [Evaluation](#evaluation)
29
- - [Testing Data, Factors & Metrics](#testing-data-factors--metrics)
30
- - [Testing Data](#testing-data)
31
- - [Factors](#factors)
32
- - [Metrics](#metrics)
33
- - [Results](#results)
34
- - [Model Examination](#model-examination)
35
  - [Environmental Impact](#environmental-impact)
36
  - [Technical Specifications](#technical-specifications)
37
  - [Model Architecture and Objective](#model-architecture-and-objective)
38
  - [Compute Infrastructure](#compute-infrastructure)
39
- - [Hardware](#hardware)
40
- - [Software](#software)
41
- - [Citation](#citation)
42
- - [Model Card Contact](#model-card-contact)
43
- - [How to Get Started with the Model](#how-to-get-started-with-the-model)
44
 
45
 
46
  # Model Details
@@ -61,6 +48,8 @@ This model was a joint collaboration of [Stanford CRFM](https://crfm.stanford.ed
61
  - **Language(s) (NLP):** en
62
  - **License:** openrail
63
 
 
 
64
  ## Direct Use
65
 
66
  <!-- This section is for the model use without fine-tuning or plugging into a larger ecosystem/app. -->
@@ -83,8 +72,6 @@ The main way we have used this model is finetuning for downstream question answe
83
  We do not recommend using this model for natural language generation in a production environment, finetuned or otherwise.
84
 
85
 
86
-
87
-
88
  # Bias, Risks, and Limitations
89
 
90
  <!-- This section is meant to convey both technical and sociotechnical limitations. -->
@@ -155,19 +142,12 @@ This allows the model to encode information about these concepts in their indivi
155
 
156
  Carbon emissions can be estimated using the [Machine Learning Impact calculator](https://mlco2.github.io/impact#compute) presented in [Lacoste et al. (2019)](https://arxiv.org/abs/1910.09700).
157
 
158
- - **Hardware Type:** More information needed
159
- - **Hours used:** More information needed
160
- - **Cloud Provider:** More information needed
161
- - **Compute Region:** More information needed
162
- - **Carbon Emitted:** More information needed
163
-
164
  # Technical Specifications
165
 
166
  ## Model Architecture and Objective
167
 
168
  Pubmed GPT 2.7B is a standard GPT-2 implementation (trained with Flash Attention) with the following hyperparameters:
169
 
170
-
171
  | | |
172
  | ----------- | ----- |
173
  | hidden size | 2560 |
@@ -176,7 +156,6 @@ Pubmed GPT 2.7B is a standard GPT-2 implementation (trained with Flash Attention
176
  | vocab size | 28896 |
177
  | sequence length| 1024 |
178
 
179
-
180
  ## Compute Infrastructure
181
 
182
  The model was trained on [MosaicML Cloud](https://www.mosaicml.com/cloud), a platform designed for large workloads like LLMs. Using the [Composer](https://github.com/mosaicml/composer) training library and [PyTorch FSDP](https://pytorch.org/docs/stable/fsdp.html), it was easy to enable multi-node training across 128 A100-40GB GPUs, and the total run was completed in ~6.25 days.
 
15
  - [Model Details](#model-details)
16
  - [Model Description](#model-description)
17
  - [Uses](#uses)
 
18
  - [Downstream Use](#downstream-use)
19
  - [Out-of-Scope Use](#out-of-scope-use)
20
  - [Bias, Risks, and Limitations](#bias-risks-and-limitations)
 
23
  - [Training Data](#training-data)
24
  - [Training Procedure](#training-procedure)
25
  - [Preprocessing](#preprocessing)
 
 
 
 
 
 
 
 
26
  - [Environmental Impact](#environmental-impact)
27
  - [Technical Specifications](#technical-specifications)
28
  - [Model Architecture and Objective](#model-architecture-and-objective)
29
  - [Compute Infrastructure](#compute-infrastructure)
30
+
 
 
 
 
31
 
32
 
33
  # Model Details
 
48
  - **Language(s) (NLP):** en
49
  - **License:** openrail
50
 
51
+ # Uses
52
+
53
  ## Direct Use
54
 
55
  <!-- This section is for the model use without fine-tuning or plugging into a larger ecosystem/app. -->
 
72
  We do not recommend using this model for natural language generation in a production environment, finetuned or otherwise.
73
 
74
 
 
 
75
  # Bias, Risks, and Limitations
76
 
77
  <!-- This section is meant to convey both technical and sociotechnical limitations. -->
 
142
 
143
  Carbon emissions can be estimated using the [Machine Learning Impact calculator](https://mlco2.github.io/impact#compute) presented in [Lacoste et al. (2019)](https://arxiv.org/abs/1910.09700).
144
 
 
 
 
 
 
 
145
  # Technical Specifications
146
 
147
  ## Model Architecture and Objective
148
 
149
  Pubmed GPT 2.7B is a standard GPT-2 implementation (trained with Flash Attention) with the following hyperparameters:
150
 
 
151
  | | |
152
  | ----------- | ----- |
153
  | hidden size | 2560 |
 
156
  | vocab size | 28896 |
157
  | sequence length| 1024 |
158
 
 
159
  ## Compute Infrastructure
160
 
161
  The model was trained on [MosaicML Cloud](https://www.mosaicml.com/cloud), a platform designed for large workloads like LLMs. Using the [Composer](https://github.com/mosaicml/composer) training library and [PyTorch FSDP](https://pytorch.org/docs/stable/fsdp.html), it was easy to enable multi-node training across 128 A100-40GB GPUs, and the total run was completed in ~6.25 days.