ybelkada commited on
Commit
929ec97
1 Parent(s): 60e8470

Update README.md (#3)

Browse files

- Update README.md (efce5c0cf0fe042745d107b9747cd5869c265731)

Files changed (1) hide show
  1. README.md +3 -11
README.md CHANGED
@@ -158,11 +158,7 @@ print(tokenizer.decode(outputs[0]))
158
 
159
  ## Direct Use and Downstream Use
160
 
161
- The authors write in [the original paper's model card](https://arxiv.org/pdf/2210.11416.pdf) that:
162
-
163
- > The primary use is research on language models, including: research on zero-shot NLP tasks and in-context few-shot learning NLP tasks, such as reasoning, and question answering; advancing fairness and safety research, and understanding limitations of current large language models
164
-
165
- See the [research paper](https://arxiv.org/pdf/2210.11416.pdf) for further details.
166
 
167
  ## Out-of-Scope Use
168
 
@@ -182,7 +178,7 @@ More information needed.
182
 
183
  ## Sensitive Use:
184
 
185
- > SwitchTransformers should not be applied for any unacceptable use cases, e.g., generation of abusive speech.
186
 
187
  # Training Details
188
 
@@ -193,11 +189,7 @@ The model was trained on a Masked Language Modeling task, on Colossal Clean Craw
193
 
194
  ## Training Procedure
195
 
196
- According to the model card from the [original paper](https://arxiv.org/pdf/2210.11416.pdf):
197
-
198
- > These models are based on pretrained SwitchTransformers and are not fine-tuned. It is normal if they perform well on zero-shot tasks.
199
-
200
- The model has been trained on TPU v3 or TPU v4 pods, using [`t5x`](https://github.com/google-research/t5x) codebase together with [`jax`](https://github.com/google/jax).
201
 
202
 
203
  # Evaluation
 
158
 
159
  ## Direct Use and Downstream Use
160
 
161
+ See the [research paper](https://arxiv.org/pdf/2101.03961.pdf) for further details.
 
 
 
 
162
 
163
  ## Out-of-Scope Use
164
 
 
178
 
179
  ## Sensitive Use:
180
 
181
+ More information needed.
182
 
183
  # Training Details
184
 
 
189
 
190
  ## Training Procedure
191
 
192
+ According to the model card from the [original paper](https://arxiv.org/pdf/2101.03961.pdf) the model has been trained on TPU v3 or TPU v4 pods, using [`t5x`](https://github.com/google-research/t5x) codebase together with [`jax`](https://github.com/google/jax).
 
 
 
 
193
 
194
 
195
  # Evaluation