Building out the model card!

#4
by Marissa - opened
Files changed (1) hide show
  1. README.md +76 -25
README.md CHANGED
@@ -3,6 +3,7 @@ language: en
3
  datasets:
4
  - squad_v2
5
  license: cc-by-4.0
 
6
  ---
7
 
8
  # roberta-base for QA
@@ -10,33 +11,18 @@ license: cc-by-4.0
10
  This is the [roberta-base](https://huggingface.co/roberta-base) model, fine-tuned using the [SQuAD2.0](https://huggingface.co/datasets/squad_v2) dataset. It's been trained on question-answer pairs, including unanswerable questions, for the task of Question Answering.
11
 
12
 
13
- ## Overview
14
- **Language model:** roberta-base
 
15
  **Language:** English
16
- **Downstream-task:** Extractive QA
17
  **Training data:** SQuAD 2.0
18
- **Eval data:** SQuAD 2.0
19
  **Code:** See [an example QA pipeline on Haystack](https://haystack.deepset.ai/tutorials/first-qa-system)
20
- **Infrastructure**: 4x Tesla v100
 
21
 
22
- ## Hyperparameters
23
-
24
- ```
25
- batch_size = 96
26
- n_epochs = 2
27
- base_LM_model = "roberta-base"
28
- max_seq_len = 386
29
- learning_rate = 3e-5
30
- lr_schedule = LinearWarmup
31
- warmup_proportion = 0.2
32
- doc_stride=128
33
- max_query_length=64
34
- ```
35
-
36
- ## Using a distilled model instead
37
- Please note that we have also released a distilled version of this model called [deepset/tinyroberta-squad2](https://huggingface.co/deepset/tinyroberta-squad2). The distilled model has a comparable prediction quality and runs at twice the speed of the base model.
38
-
39
- ## Usage
40
 
41
  ### In Haystack
42
  Haystack is an NLP framework by deepset. You can use this model in a Haystack pipeline to do question answering at scale (over many documents). To load the model in [Haystack](https://github.com/deepset-ai/haystack/):
@@ -66,8 +52,64 @@ model = AutoModelForQuestionAnswering.from_pretrained(model_name)
66
  tokenizer = AutoTokenizer.from_pretrained(model_name)
67
  ```
68
 
69
- ## Performance
70
- Evaluated on the SQuAD 2.0 dev set with the [official eval script](https://worksheets.codalab.org/rest/bundles/0x6b567e1cf2e041ec80d7098f031c5c9e/contents/blob/).
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
71
 
72
  ```
73
  "exact": 79.87029394424324,
@@ -82,6 +124,15 @@ Evaluated on the SQuAD 2.0 dev set with the [official eval script](https://works
82
  "NoAns_total": 5945
83
  ```
84
 
 
 
 
 
 
 
 
 
 
85
  ## Authors
86
  **Branden Chan:** branden.chan@deepset.ai
87
  **Timo Möller:** timo.moeller@deepset.ai
 
3
  datasets:
4
  - squad_v2
5
  license: cc-by-4.0
6
+ co2_eq_emissions: 360
7
  ---
8
 
9
  # roberta-base for QA
 
11
  This is the [roberta-base](https://huggingface.co/roberta-base) model, fine-tuned using the [SQuAD2.0](https://huggingface.co/datasets/squad_v2) dataset. It's been trained on question-answer pairs, including unanswerable questions, for the task of Question Answering.
12
 
13
 
14
+ ## Model Details
15
+ **Model developers:** [Branden Chan](branden.chan@deepset.ai), [Timo Möller](timo.moeller@deepset.ai), [Malte Pietsch](malte.pietsch@deepset.ai), [Tanay Soni](tanay.soni@deepset.ai)
16
+ **Model type:** Transformer-based language model
17
  **Language:** English
18
+ **Downstream task:** Extractive QA
19
  **Training data:** SQuAD 2.0
20
+ **Evaluation data:** SQuAD 2.0
21
  **Code:** See [an example QA pipeline on Haystack](https://haystack.deepset.ai/tutorials/first-qa-system)
22
+ **Infrastructure:** 4x Tesla v100
23
+ **Related Models:** Users should see the [roberta-base model card](https://huggingface.co/roberta-base) for information about the roberta-base model. Deepest has also released a distilled version of this model called [deepset/tinyroberta-squad2](https://huggingface.co/deepset/tinyroberta-squad2). The distilled model has a comparable prediction quality and runs at twice the speed of the base model.
24
 
25
+ ## How to Use the Model
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
26
 
27
  ### In Haystack
28
  Haystack is an NLP framework by deepset. You can use this model in a Haystack pipeline to do question answering at scale (over many documents). To load the model in [Haystack](https://github.com/deepset-ai/haystack/):
 
52
  tokenizer = AutoTokenizer.from_pretrained(model_name)
53
  ```
54
 
55
+ ### Using a distilled model instead
56
+ Please note that we have also released a distilled version of this model called [deepset/tinyroberta-squad2](https://huggingface.co/deepset/tinyroberta-squad2). The distilled model has a comparable prediction quality and runs at twice the speed of the base model.
57
+
58
+ ## Uses and Limitations
59
+
60
+ ### Uses
61
+
62
+ This model can be used for the task of question answering.
63
+
64
+ ### Limitations
65
+
66
+ Significant research has explored bias and fairness issues with language models (see, e.g., [Sheng et al. (2021)](https://aclanthology.org/2021.acl-long.330.pdf) and [Bender et al. (2021)](https://dl.acm.org/doi/pdf/10.1145/3442188.3445922)). The [roberta-base model card](https://huggingface.co/roberta-base#training-data) notes that:
67
+
68
+ > The training data used for this model contains a lot of unfiltered content from the internet, which is far from neutral. Therefore, the model can have biased predictions...This bias will also affect all fine-tuned versions of this model.
69
+
70
+ See the [roberta-base model card](https://huggingface.co/roberta-base) for demonstrative examples. Note that those examples are not a comprehensive stress-testing of the model. Readers considering using the model should consider whether more rigorous evaluations of the model may be appropriate depending on their use case and context. For discussion of bias in QA systems, see, e.g., [Mao et al. (2021)](https://aclanthology.org/2021.mrqa-1.9.pdf).
71
+
72
+ ## Training
73
+
74
+ ### Training Data
75
+
76
+ This model is the [roberta-base](https://huggingface.co/roberta-base) model, fine tuned using the [Squad2.0](https://huggingface.co/datasets/squad_v2) dataset. See the [Squad2.0 dataset card](https://huggingface.co/datasets/squad_v2) to learn more about Squad2.0. From the [roberta-base model card](https://huggingface.co/roberta-base#training-data) training data section:
77
+
78
+ > The RoBERTa model was pretrained on the reunion of five datasets:
79
+ > - [BookCorpus](https://yknzhu.wixsite.com/mbweb), a dataset consisting of 11,038 unpublished books;
80
+ > - [English Wikipedia](https://en.wikipedia.org/wiki/English_Wikipedia) (excluding lists, tables and headers) ;
81
+ > - [CC-News](https://commoncrawl.org/2016/10/news-dataset-available/), a dataset containing 63 millions English news
82
+ articles crawled between September 2016 and February 2019.
83
+ > - [OpenWebText](https://github.com/jcpeterson/openwebtext), an opensource recreation of the WebText dataset used to
84
+ train GPT-2,
85
+ > - [Stories](https://arxiv.org/abs/1806.02847) a dataset containing a subset of CommonCrawl data filtered to match the
86
+ story-like style of Winograd schemas.
87
+ >
88
+ > Together theses datasets weight 160GB of text.
89
+
90
+ To learn more about these datasets, see some of the associated dataset cards: [BookCorpus](https://huggingface.co/datasets/bookcorpus), [CC-News](https://huggingface.co/datasets/cc_news).
91
+
92
+ ### Training Procedure
93
+
94
+ The hyperparameters were:
95
+
96
+ ```
97
+ batch_size = 96
98
+ n_epochs = 2
99
+ base_LM_model = "roberta-base"
100
+ max_seq_len = 386
101
+ learning_rate = 3e-5
102
+ lr_schedule = LinearWarmup
103
+ warmup_proportion = 0.2
104
+ doc_stride=128
105
+ max_query_length=64
106
+ ```
107
+
108
+ ## Evaluation Results
109
+
110
+ The model was evaluated on the SQuAD 2.0 dev set with the [official eval script](https://worksheets.codalab.org/rest/bundles/0x6b567e1cf2e041ec80d7098f031c5c9e/contents/blob/).
111
+
112
+ Evaluation results include:
113
 
114
  ```
115
  "exact": 79.87029394424324,
 
124
  "NoAns_total": 5945
125
  ```
126
 
127
+ ## Environmental Impacts
128
+
129
+ *Carbon emissions associated with training the model (fine-tuning the [roberta-base model](https://huggingface.co/roberta-base)) were estimated using the [Machine Learning Impact calculator](https://mlco2.github.io/impact#compute) presented in [Lacoste et al. (2019)](https://arxiv.org/abs/1910.09700). The hardware, runtime, cloud provider, and compute region were utilized to estimate the carbon impact.*
130
+ - **Hardware Type:** 4x V100 GPU (p3.8xlarge)
131
+ - **Hours used:** .5 (30 minutes)
132
+ - **Cloud Provider:** AWS
133
+ - **Compute Region:** EU-Ireland
134
+ - **Carbon Emitted** *(Power consumption x Time x Carbon produced based on location of power grid)*: .36 kg CO2 eq.
135
+
136
  ## Authors
137
  **Branden Chan:** branden.chan@deepset.ai
138
  **Timo Möller:** timo.moeller@deepset.ai