J38 commited on
Commit
a3e5fb6
1 Parent(s): 4d18f91

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +0 -7
README.md CHANGED
@@ -54,26 +54,21 @@ This model was a joint collaboration of [Stanford CRFM](https://crfm.stanford.ed
54
 
55
  <!-- This section is for the model use without fine-tuning or plugging into a larger ecosystem/app. -->
56
  <!-- If the user enters content, print that. If not, but they enter a task in the list, use that. If neither, say "more info needed." -->
57
-
58
  It is possible to use this model to generate text, which is useful for experimentation and understanding its capabilities. It should not be directly used for production or work that may directly impact people.
59
 
60
  ## Downstream Use
61
 
62
  <!-- This section is for the model use when fine-tuned for a task, or when plugged into a larger ecosystem/app -->
63
-
64
  The main way we have used this model is finetuning for downstream question answering tasks, and we recommend using this model that way.
65
 
66
  ## Out-of-Scope Use
67
 
68
  <!-- This section addresses misuse, malicious use, and uses that the model will not work well for. -->
69
-
70
  We do not recommend using this model for natural language generation in a production environment, finetuned or otherwise.
71
 
72
  # Bias, Risks, and Limitations
73
 
74
  <!-- This section is meant to convey both technical and sociotechnical limitations. -->
75
-
76
-
77
  Significant research has explored bias and fairness issues with language models (see, e.g., [Sheng et al. (2021)](https://aclanthology.org/2021.acl-long.330.pdf) and [Bender et al. (2021)](https://dl.acm.org/doi/pdf/10.1145/3442188.3445922)). Predictions generated by the model may include disturbing and harmful stereotypes across protected classes; identity characteristics; and sensitive, social, and occupational groups.
78
 
79
  ## Recommendations
@@ -102,7 +97,6 @@ The model was trained on [MosaicML Cloud](https://www.mosaicml.com/cloud), a pla
102
  | betas | \[0.9, 0.95\] |
103
  | weight decay | 1.6e-5 |
104
 
105
-
106
  The training process was very smooth and did not suffer from any divergences.
107
 
108
  As we were preparing the training run, we were unsure of the benefits of training out to 300B tokens for language model perplexity and downstream task performance. While most models of this scale (e.g. GPT Neo 2.7B) are trained to 300-400B tokens, the datasets those models use are vastly larger than PubMed. For instance, The Pile is 8x the size of its PubMed subcorpora.
@@ -114,7 +108,6 @@ Fortunately, we did continue to see steady perplexity improvements on the valida
114
  The model uses a custom tokenizer trained on the PubMed Abstracts. When building domain specific models we have found it important to use a tokenizer trained on in-domain text to maximize performance on downstream tasks. A key benefit is that common biomedical terms are represented as entire tokens.
115
 
116
  For instance, all of these following terms are tokenized into single tokens by the biomedical tokenizer and multiple tokens by the standard GPT-2 tokenizer:
117
-
118
 
119
  | | |
120
  | --- | --- |
 
54
 
55
  <!-- This section is for the model use without fine-tuning or plugging into a larger ecosystem/app. -->
56
  <!-- If the user enters content, print that. If not, but they enter a task in the list, use that. If neither, say "more info needed." -->
 
57
  It is possible to use this model to generate text, which is useful for experimentation and understanding its capabilities. It should not be directly used for production or work that may directly impact people.
58
 
59
  ## Downstream Use
60
 
61
  <!-- This section is for the model use when fine-tuned for a task, or when plugged into a larger ecosystem/app -->
 
62
  The main way we have used this model is finetuning for downstream question answering tasks, and we recommend using this model that way.
63
 
64
  ## Out-of-Scope Use
65
 
66
  <!-- This section addresses misuse, malicious use, and uses that the model will not work well for. -->
 
67
  We do not recommend using this model for natural language generation in a production environment, finetuned or otherwise.
68
 
69
  # Bias, Risks, and Limitations
70
 
71
  <!-- This section is meant to convey both technical and sociotechnical limitations. -->
 
 
72
  Significant research has explored bias and fairness issues with language models (see, e.g., [Sheng et al. (2021)](https://aclanthology.org/2021.acl-long.330.pdf) and [Bender et al. (2021)](https://dl.acm.org/doi/pdf/10.1145/3442188.3445922)). Predictions generated by the model may include disturbing and harmful stereotypes across protected classes; identity characteristics; and sensitive, social, and occupational groups.
73
 
74
  ## Recommendations
 
97
  | betas | \[0.9, 0.95\] |
98
  | weight decay | 1.6e-5 |
99
 
 
100
  The training process was very smooth and did not suffer from any divergences.
101
 
102
  As we were preparing the training run, we were unsure of the benefits of training out to 300B tokens for language model perplexity and downstream task performance. While most models of this scale (e.g. GPT Neo 2.7B) are trained to 300-400B tokens, the datasets those models use are vastly larger than PubMed. For instance, The Pile is 8x the size of its PubMed subcorpora.
 
108
  The model uses a custom tokenizer trained on the PubMed Abstracts. When building domain specific models we have found it important to use a tokenizer trained on in-domain text to maximize performance on downstream tasks. A key benefit is that common biomedical terms are represented as entire tokens.
109
 
110
  For instance, all of these following terms are tokenized into single tokens by the biomedical tokenizer and multiple tokens by the standard GPT-2 tokenizer:
 
111
 
112
  | | |
113
  | --- | --- |