sgugger Marissa commited on
Commit
0dcbcf2
1 Parent(s): 130fb28

Add model card (#1)

Browse files

- Add model card (82f6ccf8f35b3001ba637eaa75a29b40f4b08ead)
- removing library name (814cc89dbd38891c3e3273cd4c0a22dbdbf4dd6a)


Co-authored-by: Marissa Gerchick <Marissa@users.noreply.huggingface.co>

Files changed (1) hide show
  1. README.md +167 -6
README.md CHANGED
@@ -1,13 +1,175 @@
1
  ---
 
 
2
  license: mit
3
- widget:
4
- - text: "I like you. </s></s> I love you."
 
 
 
 
5
  ---
6
 
 
7
 
8
- ## roberta-large-mnli
 
 
 
 
 
 
 
 
 
 
9
 
10
- Trained by Facebook, [original source](https://github.com/pytorch/fairseq/tree/master/examples/roberta)
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
11
 
12
  ```bibtex
13
  @article{liu2019roberta,
@@ -18,5 +180,4 @@ Trained by Facebook, [original source](https://github.com/pytorch/fairseq/tree/m
18
  journal={arXiv preprint arXiv:1907.11692},
19
  year = {2019},
20
  }
21
- ```
22
-
1
  ---
2
+ language:
3
+ - en
4
  license: mit
5
+ tags:
6
+ - autogenerated-modelcard
7
+ datasets:
8
+ - multi_nli
9
+ - wikipedia
10
+ - bookcorpus
11
  ---
12
 
13
+ # roberta-large-mnli
14
 
15
+ ## Table of Contents
16
+ - [Model Details](#model-details)
17
+ - [How To Get Started With the Model](#how-to-get-started-with-the-model)
18
+ - [Uses](#uses)
19
+ - [Risks, Limitations and Biases](#risks-limitations-and-biases)
20
+ - [Training](#training)
21
+ - [Evaluation](#evaluation-results)
22
+ - [Environmental Impact](#environmental-impact)
23
+ - [Technical Specifications](#technical-specifications)
24
+ - [Citation Information](#citation-information)
25
+ - [Model Card Authors](#model-card-author)
26
 
27
+ ## Model Details
28
+
29
+ **Model Description:** roberta-large-mnli is the [RoBERTa large model](https://huggingface.co/roberta-large) fine-tuned on the [Multi-Genre Natural Language Inference (MNLI)](https://huggingface.co/datasets/multi_nli) corpus. The model is a pretrained model on English language text using a masked language modeling (MLM) objective.
30
+
31
+ - **Developed by:** See [GitHub Repo](https://github.com/facebookresearch/fairseq/tree/main/examples/roberta) for model developers
32
+ - **Model Type:** Transformer-based language model
33
+ - **Language(s):** English
34
+ - **License:** MIT
35
+ - **Parent Model:** This model is a fine-tuned version of the RoBERTa large model. Users should see the [RoBERTa large model card](https://huggingface.co/roberta-large) for relevant information.
36
+ - **Resources for more information:**
37
+ - [Research Paper](https://arxiv.org/abs/1907.11692)
38
+ - [GitHub Repo](https://github.com/facebookresearch/fairseq/tree/main/examples/roberta)
39
+
40
+ ## How to Get Started with the Model
41
+
42
+ Use the code below to get started with the model. The model can be loaded with the zero-shot-classification pipeline like so:
43
+
44
+ ```python
45
+ from transformers import pipeline
46
+ classifier = pipeline('zero-shot-classification', model='roberta-large-mnli')
47
+ ```
48
+
49
+ You can then use this pipeline to classify sequences into any of the class names you specify. For example:
50
+
51
+ ```python
52
+ sequence_to_classify = "one day I will see the world"
53
+ candidate_labels = ['travel', 'cooking', 'dancing']
54
+ classifier(sequence_to_classify, candidate_labels)
55
+ ```
56
+
57
+ ## Uses
58
+
59
+ #### Direct Use
60
+
61
+ This fine-tuned model can be used for zero-shot classification tasks, including zero-shot sentence-pair classification (see the [GitHub repo](https://github.com/facebookresearch/fairseq/tree/main/examples/roberta) for examples) and zero-shot sequence classification.
62
+
63
+ #### Misuse and Out-of-scope Use
64
+
65
+ The model should not be used to intentionally create hostile or alienating environments for people. In addition, the model was not trained to be factual or true representations of people or events, and therefore using the model to generate such content is out-of-scope for the abilities of this model.
66
+
67
+ ## Risks, Limitations and Biases
68
+
69
+ **CONTENT WARNING: Readers should be aware this section contains content that is disturbing, offensive, and can propogate historical and current stereotypes.**
70
+
71
+ Significant research has explored bias and fairness issues with language models (see, e.g., [Sheng et al. (2021)](https://aclanthology.org/2021.acl-long.330.pdf) and [Bender et al. (2021)](https://dl.acm.org/doi/pdf/10.1145/3442188.3445922)). The [RoBERTa large model card](https://huggingface.co/roberta-large) notes that: "The training data used for this model contains a lot of unfiltered content from the internet, which is far from neutral."
72
+
73
+ Predictions generated by the model can include disturbing and harmful stereotypes across protected classes; identity characteristics; and sensitive, social, and occupational groups. For example:
74
+
75
+ ```python
76
+ sequence_to_classify = "The CEO had a strong handshake."
77
+ candidate_labels = ['male', 'female']
78
+ hypothesis_template = "This text speaks about a {} profession."
79
+ classifier(sequence_to_classify, candidate_labels, hypothesis_template=hypothesis_template)
80
+ ```
81
+
82
+ Users (both direct and downstream) should be made aware of the risks, biases and limitations of the model.
83
+
84
+ ## Training
85
+
86
+ #### Training Data
87
+
88
+ This model was fine-tuned on the [Multi-Genre Natural Language Inference (MNLI)](https://cims.nyu.edu/~sbowman/multinli/) corpus. Also see the [MNLI data card](https://huggingface.co/datasets/multi_nli) for more information.
89
+
90
+ As described in the [RoBERTa large model card](https://huggingface.co/roberta-large):
91
+
92
+ > The RoBERTa model was pretrained on the reunion of five datasets:
93
+ >
94
+ > - [BookCorpus](https://yknzhu.wixsite.com/mbweb), a dataset consisting of 11,038 unpublished books;
95
+ > - [English Wikipedia](https://en.wikipedia.org/wiki/English_Wikipedia) (excluding lists, tables and headers) ;
96
+ > - [CC-News](https://commoncrawl.org/2016/10/news-dataset-available/), a dataset containing 63 millions English news articles crawled between September 2016 and February 2019.
97
+ > - [OpenWebText](https://github.com/jcpeterson/openwebtext), an opensource recreation of the WebText dataset used to train GPT-2,
98
+ > - [Stories](https://arxiv.org/abs/1806.02847), a dataset containing a subset of CommonCrawl data filtered to match the story-like style of Winograd schemas.
99
+ >
100
+ > Together theses datasets weight 160GB of text.
101
+
102
+ Also see the [bookcorpus data card](https://huggingface.co/datasets/bookcorpus) and the [wikipedia data card](https://huggingface.co/datasets/wikipedia) for additional information.
103
+
104
+ #### Training Procedure
105
+
106
+ ##### Preprocessing
107
+
108
+ As described in the [RoBERTa large model card](https://huggingface.co/roberta-large):
109
+
110
+ > The texts are tokenized using a byte version of Byte-Pair Encoding (BPE) and a vocabulary size of 50,000. The inputs of
111
+ > the model take pieces of 512 contiguous token that may span over documents. The beginning of a new document is marked
112
+ > with `<s>` and the end of one by `</s>`
113
+ >
114
+ > The details of the masking procedure for each sentence are the following:
115
+ > - 15% of the tokens are masked.
116
+ > - In 80% of the cases, the masked tokens are replaced by `<mask>`.
117
+ > - In 10% of the cases, the masked tokens are replaced by a random token (different) from the one they replace.
118
+ > - In the 10% remaining cases, the masked tokens are left as is.
119
+ >
120
+ > Contrary to BERT, the masking is done dynamically during pretraining (e.g., it changes at each epoch and is not fixed).
121
+
122
+ ##### Pretraining
123
+
124
+ Also as described in the [RoBERTa large model card](https://huggingface.co/roberta-large):
125
+
126
+ > The model was trained on 1024 V100 GPUs for 500K steps with a batch size of 8K and a sequence length of 512. The
127
+ > optimizer used is Adam with a learning rate of 4e-4, \\(\beta_{1} = 0.9\\), \\(\beta_{2} = 0.98\\) and
128
+ > \\(\epsilon = 1e-6\\), a weight decay of 0.01, learning rate warmup for 30,000 steps and linear decay of the learning
129
+ > rate after.
130
+
131
+ ## Evaluation
132
+
133
+ The following evaluation information is extracted from the associated [GitHub repo for RoBERTa](https://github.com/facebookresearch/fairseq/tree/main/examples/roberta).
134
+
135
+ #### Testing Data, Factors and Metrics
136
+
137
+ The model developers report that the model was evaluated on the following tasks and datasets using the listed metrics:
138
+
139
+ - **Dataset:** Part of [GLUE (Wang et al., 2019)](https://arxiv.org/pdf/1804.07461.pdf), the General Language Understanding Evaluation benchmark, a collection of 9 datasets for evaluating natural language understanding systems. Specifically, the model was evaluated on the [Multi-Genre Natural Language Inference (MNLI)](https://cims.nyu.edu/~sbowman/multinli/) corpus. See the [GLUE data card](https://huggingface.co/datasets/glue) or [Wang et al. (2019)](https://arxiv.org/pdf/1804.07461.pdf) for further information.
140
+ - **Tasks:** NLI. [Wang et al. (2019)](https://arxiv.org/pdf/1804.07461.pdf) describe the inference task for MNLI as:
141
+ > The Multi-Genre Natural Language Inference Corpus [(Williams et al., 2018)](https://arxiv.org/abs/1704.05426) is a crowd-sourced collection of sentence pairs with textual entailment annotations. Given a premise sentence and a hypothesis sentence, the task is to predict whether the premise entails the hypothesis (entailment), contradicts the hypothesis (contradiction), or neither (neutral). The premise sentences are gathered from ten different sources, including transcribed speech, fiction, and government reports. We use the standard test set, for which we obtained private labels from the authors, and evaluate on both the matched (in-domain) and mismatched (cross-domain) sections. We also use and recommend the SNLI corpus [(Bowman et al., 2015)](https://arxiv.org/abs/1508.05326) as 550k examples of auxiliary training data.
142
+ - **Metrics:** Accuracy
143
+
144
+ - **Dataset:** [XNLI (Conneau et al., 2018)](https://arxiv.org/pdf/1809.05053.pdf), the extension of the [Multi-Genre Natural Language Inference (MNLI)](https://cims.nyu.edu/~sbowman/multinli/) corpus to 15 languages: English, French, Spanish, German, Greek, Bulgarian, Russian, Turkish, Arabic, Vietnamese, Thai, Chinese, Hindi, Swahili and Urdu. See the [XNLI data card](https://huggingface.co/datasets/xnli) or [Conneau et al. (2018)](https://arxiv.org/pdf/1809.05053.pdf) for further information.
145
+ - **Tasks:** Translate-test (e.g., the model is used to translate input sentences in other languages to the training language)
146
+ - **Metrics:** Accuracy
147
+
148
+ #### Results
149
+
150
+ GLUE test results (dev set, single model, single-task fine-tuning): 90.2 on MNLI
151
+
152
+ XNLI test results:
153
+
154
+ | Task | en | fr | es | de | el | bg | ru | tr | ar | vi | th | zh | hi | sw | ur |
155
+ |:----:|:--:|:---:|:---:|:---:|:---:|:---:|:---:|:---:|:---:|:---:|:---:|:---:|:---:|:---:|:---:|
156
+ | |91.3|82.91|84.27|81.24|81.74|83.13|78.28|76.79|76.64|74.17|74.05| 77.5| 70.9|66.65|66.81|
157
+
158
+ ## Environmental Impact
159
+
160
+ Carbon emissions can be estimated using the [Machine Learning Impact calculator](https://mlco2.github.io/impact#compute) presented in [Lacoste et al. (2019)](https://arxiv.org/abs/1910.09700). We present the hardware type and hours used based on the [associated paper](https://arxiv.org/pdf/1907.11692.pdf).
161
+
162
+ - **Hardware Type:** 1024 V100 GPUs
163
+ - **Hours used:** 24 hours (one day)
164
+ - **Cloud Provider:** Unknown
165
+ - **Compute Region:** Unknown
166
+ - **Carbon Emitted:** Unknown
167
+
168
+ ## Technical Specifications
169
+
170
+ See the [associated paper](https://arxiv.org/pdf/1907.11692.pdf) for details on the modeling architecture, objective, compute infrastructure, and training details.
171
+
172
+ ## Citation Information
173
 
174
  ```bibtex
175
  @article{liu2019roberta,
180
  journal={arXiv preprint arXiv:1907.11692},
181
  year = {2019},
182
  }
183
+ ```