model documentation

#3
by nazneen - opened
Files changed (1) hide show
  1. README.md +183 -0
README.md ADDED
@@ -0,0 +1,183 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ tags:
3
+ - bert
4
+
5
+ ---
6
+ # Model Card for ernie-gram
7
+
8
+
9
+ # Model Details
10
+
11
+ ## Model Description
12
+
13
+ ERNIE-Gram: Pre-Training with Explicitly N-Gram Masked Language Modeling for Natural Language Understanding.
14
+
15
+ - **Developed by:** Dongling Xiao, Yukun Li, Han Zhang, Yu Sun, Hao Tian, Hua Wu and Haifeng Wang
16
+ - **Shared by [Optional]:** Peterchou
17
+ - **Model type:** More information needed
18
+ - **Language(s) (NLP):** Chinese, English (more information needed)
19
+ - **License:** More information needed
20
+ - **Related Models:**
21
+ - **Parent Model:** BERT
22
+ - **Resources for more information:**
23
+ - [GitHub Repo](https://github.com/PaddlePaddle/ERNIE)
24
+ - [Associated Paper](https://arxiv.org/abs/2010.12148)
25
+
26
+
27
+ # Uses
28
+
29
+
30
+ ## Direct Use
31
+
32
+ More information needed
33
+
34
+ ## Downstream Use [Optional]
35
+
36
+ This model could also be used for the task of question answering and text classification.
37
+
38
+ ## Out-of-Scope Use
39
+
40
+ The model should not be used to intentionally create hostile or alienating environments for people.
41
+
42
+ # Bias, Risks, and Limitations
43
+
44
+ Significant research has explored bias and fairness issues with language models (see, e.g., [Sheng et al. (2021)](https://aclanthology.org/2021.acl-long.330.pdf) and [Bender et al. (2021)](https://dl.acm.org/doi/pdf/10.1145/3442188.3445922)). Predictions generated by the model may include disturbing and harmful stereotypes across protected classes; identity characteristics; and sensitive, social, and occupational groups.
45
+
46
+
47
+ ## Recommendations
48
+
49
+ Users (both direct and downstream) should be made aware of the risks, biases and limitations of the model. More information needed for further recommendations.
50
+
51
+
52
+ # Training Details
53
+
54
+ ## Training Data
55
+
56
+ The model creators note in the [associated paper](https://arxiv.org/abs/2010.12148):
57
+
58
+ >English Pre-training Data. We use two com- mon text corpora for English pre-training:
59
+ • Base-scale corpora: 16GB uncompressed text from WIKIPEDIA and BOOKSCORPUS (Zhu et al., 2015), which is the original data for BERT.
60
+ • Large-scale corpora: 160GB uncompressed text from WIKIPEDIA, BOOKSCORPUS, OPEN- WEBTEXT3, CC-NEWS (Liu et al., 2019) and STORIES (Trinh and Le, 2018), which is the original data used in RoBERTa.
61
+
62
+ > Chinese Pre-training Data. We adopt the same Chinese text corpora used in ERNIE2.0 (Sun et al., 2020) to pre-train ERNIE-Gram.
63
+
64
+
65
+
66
+
67
+
68
+ ## Training Procedure
69
+
70
+
71
+ ### Preprocessing
72
+
73
+ The model authors note in the [associated paper](https://arxiv.org/abs/2010.12148):
74
+ > For pre-training on base-scale English corpora, the batch size is set to 256 sequences, the peak learning rate is 1e-4 for 1M training steps, which are the same settings as BERTBASE. As for large-scale English corpora, the batch size is 5112 sequences, the peak learning rate is 4e-4 for 500K training steps. For pre-training on Chinese corpora, the batch size is 256 sequences, the peak learning rate is 1e-4 for 3M training steps.
75
+
76
+ ### Speeds, Sizes, Times
77
+
78
+ More information needed
79
+
80
+ # Evaluation
81
+
82
+
83
+ ## Testing Data, Factors & Metrics
84
+
85
+ ### Testing Data
86
+
87
+ More information needed
88
+
89
+ ### Factors
90
+
91
+
92
+ ### Metrics
93
+
94
+ More information needed
95
+ ## Results
96
+
97
+ Classification and matching use the CLUE dataset.
98
+ CLUE evaluation results:
99
+
100
+ | 配置 | 模型 | CLUEWSC2020 | IFLYTEK | TNEWS | AFQMC | CMNLI | CSL | OCNLI | 平均值 |
101
+ |----------|------------------|-------------|---------|-------|-------|-------|-------|-------|--------|
102
+ | 20L1024H | ERNIE 3.0-XBase | 91.12 | 62.22 | 60.34 | 76.95 | 84.98 | 84.27 | 82.07 | 77.42 |
103
+ | 12L768H | ERNIE 3.0-Base | 88.18 | 60.72 | 58.73 | 76.53 | 83.65 | 83.30 | 80.31 | 75.63 |
104
+ | 6L768H | ERNIE 3.0-Medium | 79.93 | 60.14 | 57.16 | 74.56 | 80.87 | 81.23 | 77.02 | 72.99 |
105
+
106
+
107
+
108
+ # Model Examination
109
+
110
+ More information needed
111
+
112
+ # Environmental Impact
113
+
114
+
115
+ Carbon emissions can be estimated using the [Machine Learning Impact calculator](https://mlco2.github.io/impact#compute) presented in [Lacoste et al. (2019)](https://arxiv.org/abs/1910.09700).
116
+
117
+ - **Hardware Type:** More information needed
118
+ - **Hours used:** More information needed
119
+ - **Cloud Provider:** More information needed
120
+ - **Compute Region:** More information needed
121
+ - **Carbon Emitted:** More information needed
122
+
123
+ # Technical Specifications [optional]
124
+
125
+ ## Model Architecture and Objective
126
+
127
+ More information needed
128
+
129
+ ## Compute Infrastructure
130
+
131
+ More information needed
132
+
133
+ ### Hardware
134
+
135
+ More information needed
136
+
137
+ ### Software
138
+ More information needed
139
+
140
+ # Citation
141
+
142
+
143
+ **BibTeX:**
144
+ ```
145
+ @article{xiao2020ernie,
146
+ title={ERNIE-Gram: Pre-Training with Explicitly N-Gram Masked Language Modeling for Natural Language Understanding},
147
+ author={Xiao, Dongling and Li, Yu-Kun and Zhang, Han and Sun, Yu and Tian, Hao and Wu, Hua and Wang, Haifeng},
148
+ journal={arXiv preprint arXiv:2010.12148},
149
+ year={2020}
150
+ }
151
+ ```
152
+
153
+
154
+
155
+ # Glossary [optional]
156
+ More information needed
157
+
158
+ # More Information [optional]
159
+
160
+ More information needed
161
+
162
+ # Model Card Authors [optional]
163
+
164
+
165
+ Peterchou in collaboration with Ezi Ozoani and the Hugging Face team
166
+
167
+ # Model Card Contact
168
+
169
+ More information needed
170
+
171
+ # How to Get Started with the Model
172
+
173
+ Use the code below to get started with the model.
174
+
175
+ <details>
176
+ <summary> Click to expand </summary>
177
+
178
+ ```python
179
+ from transformers import AutoModel
180
+
181
+ model = AutoModel.from_pretrained("peterchou/ernie-gram")
182
+ ```
183
+ </details>