google
/

switch-base-128

Text2Text Generation

switch_transformers

Inference Endpoints

Model card Files Files and versions Community

ybelkada commited on Nov 16, 2022

Commit

929ec97

•

1 Parent(s): 60e8470

Update README.md (#3)

- Update README.md (efce5c0cf0fe042745d107b9747cd5869c265731)

Files changed (1) hide show

README.md +3 -11

README.md CHANGED Viewed

@@ -158,11 +158,7 @@ print(tokenizer.decode(outputs[0]))
 ## Direct Use and Downstream Use
-The authors write in [the original paper's model card](https://arxiv.org/pdf/2210.11416.pdf) that:
-> The primary use is research on language models, including: research on zero-shot NLP tasks and in-context few-shot learning NLP tasks, such as reasoning, and question answering; advancing fairness and safety research, and understanding limitations of current large language models
-See the [research paper](https://arxiv.org/pdf/2210.11416.pdf) for further details.
 ## Out-of-Scope Use
@@ -182,7 +178,7 @@ More information needed.
 ## Sensitive Use:
-> SwitchTransformers should not be applied for any unacceptable use cases, e.g., generation of abusive speech.
 # Training Details
@@ -193,11 +189,7 @@ The model was trained on a Masked Language Modeling task, on Colossal Clean Craw
 ## Training Procedure
-According to the model card from the [original paper](https://arxiv.org/pdf/2210.11416.pdf):
-> These models are based on pretrained SwitchTransformers and are not fine-tuned. It is normal if they perform well on zero-shot tasks.
-The model has been trained on TPU v3 or TPU v4 pods, using [`t5x`](https://github.com/google-research/t5x) codebase together with [`jax`](https://github.com/google/jax).
 # Evaluation

 ## Direct Use and Downstream Use
+See the [research paper](https://arxiv.org/pdf/2101.03961.pdf) for further details.
 ## Out-of-Scope Use
 ## Sensitive Use:
+More information needed.
 # Training Details
 ## Training Procedure
+According to the model card from the [original paper](https://arxiv.org/pdf/2101.03961.pdf) the model has been trained on TPU v3 or TPU v4 pods, using [`t5x`](https://github.com/google-research/t5x) codebase together with [`jax`](https://github.com/google/jax).
 # Evaluation