--- tags: - bert --- # Model Card for ernie-gram # Model Details ## Model Description ERNIE-Gram: Pre-Training with Explicitly N-Gram Masked Language Modeling for Natural Language Understanding. - **Developed by:** Dongling Xiao, Yukun Li, Han Zhang, Yu Sun, Hao Tian, Hua Wu and Haifeng Wang - **Shared by [Optional]:** Peterchou - **Model type:** More information needed - **Language(s) (NLP):** Chinese, English (more information needed) - **License:** More information needed - **Related Models:** - **Parent Model:** BERT - **Resources for more information:** - [GitHub Repo](https://github.com/PaddlePaddle/ERNIE) - [Associated Paper](https://arxiv.org/abs/2010.12148) # Uses ## Direct Use More information needed ## Downstream Use [Optional] This model could also be used for the task of question answering and text classification. ## Out-of-Scope Use The model should not be used to intentionally create hostile or alienating environments for people. # Bias, Risks, and Limitations Significant research has explored bias and fairness issues with language models (see, e.g., [Sheng et al. (2021)](https://aclanthology.org/2021.acl-long.330.pdf) and [Bender et al. (2021)](https://dl.acm.org/doi/pdf/10.1145/3442188.3445922)). Predictions generated by the model may include disturbing and harmful stereotypes across protected classes; identity characteristics; and sensitive, social, and occupational groups. ## Recommendations Users (both direct and downstream) should be made aware of the risks, biases and limitations of the model. More information needed for further recommendations. # Training Details ## Training Data The model creators note in the [associated paper](https://arxiv.org/abs/2010.12148): >English Pre-training Data. We use two com- mon text corpora for English pre-training: • Base-scale corpora: 16GB uncompressed text from WIKIPEDIA and BOOKSCORPUS (Zhu et al., 2015), which is the original data for BERT. • Large-scale corpora: 160GB uncompressed text from WIKIPEDIA, BOOKSCORPUS, OPEN- WEBTEXT3, CC-NEWS (Liu et al., 2019) and STORIES (Trinh and Le, 2018), which is the original data used in RoBERTa. > Chinese Pre-training Data. We adopt the same Chinese text corpora used in ERNIE2.0 (Sun et al., 2020) to pre-train ERNIE-Gram. ## Training Procedure ### Preprocessing The model authors note in the [associated paper](https://arxiv.org/abs/2010.12148): > For pre-training on base-scale English corpora, the batch size is set to 256 sequences, the peak learning rate is 1e-4 for 1M training steps, which are the same settings as BERTBASE. As for large-scale English corpora, the batch size is 5112 sequences, the peak learning rate is 4e-4 for 500K training steps. For pre-training on Chinese corpora, the batch size is 256 sequences, the peak learning rate is 1e-4 for 3M training steps. ### Speeds, Sizes, Times More information needed # Evaluation ## Testing Data, Factors & Metrics ### Testing Data More information needed ### Factors ### Metrics More information needed ## Results Classification and matching use the CLUE dataset. CLUE evaluation results: | 配置 | 模型 | CLUEWSC2020 | IFLYTEK | TNEWS | AFQMC | CMNLI | CSL | OCNLI | 平均值 | |----------|------------------|-------------|---------|-------|-------|-------|-------|-------|--------| | 20L1024H | ERNIE 3.0-XBase | 91.12 | 62.22 | 60.34 | 76.95 | 84.98 | 84.27 | 82.07 | 77.42 | | 12L768H | ERNIE 3.0-Base | 88.18 | 60.72 | 58.73 | 76.53 | 83.65 | 83.30 | 80.31 | 75.63 | | 6L768H | ERNIE 3.0-Medium | 79.93 | 60.14 | 57.16 | 74.56 | 80.87 | 81.23 | 77.02 | 72.99 | # Model Examination More information needed # Environmental Impact Carbon emissions can be estimated using the [Machine Learning Impact calculator](https://mlco2.github.io/impact#compute) presented in [Lacoste et al. (2019)](https://arxiv.org/abs/1910.09700). - **Hardware Type:** More information needed - **Hours used:** More information needed - **Cloud Provider:** More information needed - **Compute Region:** More information needed - **Carbon Emitted:** More information needed # Technical Specifications [optional] ## Model Architecture and Objective More information needed ## Compute Infrastructure More information needed ### Hardware More information needed ### Software More information needed # Citation **BibTeX:** ``` @article{xiao2020ernie, title={ERNIE-Gram: Pre-Training with Explicitly N-Gram Masked Language Modeling for Natural Language Understanding}, author={Xiao, Dongling and Li, Yu-Kun and Zhang, Han and Sun, Yu and Tian, Hao and Wu, Hua and Wang, Haifeng}, journal={arXiv preprint arXiv:2010.12148}, year={2020} } ``` # Glossary [optional] More information needed # More Information [optional] More information needed # Model Card Authors [optional] Peterchou in collaboration with Ezi Ozoani and the Hugging Face team # Model Card Contact More information needed # How to Get Started with the Model Use the code below to get started with the model.
Click to expand ```python from transformers import AutoModel model = AutoModel.from_pretrained("peterchou/ernie-gram") ```