ernie-gram / README.md
nazneen's picture
model documentation
933ee70
metadata
tags:
  - bert

Model Card for ernie-gram

Model Details

Model Description

ERNIE-Gram: Pre-Training with Explicitly N-Gram Masked Language Modeling for Natural Language Understanding.

  • Developed by: Dongling Xiao, Yukun Li, Han Zhang, Yu Sun, Hao Tian, Hua Wu and Haifeng Wang
  • Shared by [Optional]: Peterchou
  • Model type: More information needed
  • Language(s) (NLP): Chinese, English (more information needed)
  • License: More information needed
  • Related Models:
    • Parent Model: BERT
  • Resources for more information:

Uses

Direct Use

More information needed

Downstream Use [Optional]

This model could also be used for the task of question answering and text classification.

Out-of-Scope Use

The model should not be used to intentionally create hostile or alienating environments for people.

Bias, Risks, and Limitations

Significant research has explored bias and fairness issues with language models (see, e.g., Sheng et al. (2021) and Bender et al. (2021)). Predictions generated by the model may include disturbing and harmful stereotypes across protected classes; identity characteristics; and sensitive, social, and occupational groups.

Recommendations

Users (both direct and downstream) should be made aware of the risks, biases and limitations of the model. More information needed for further recommendations.

Training Details

Training Data

The model creators note in the associated paper:

English Pre-training Data. We use two com- mon text corpora for English pre-training: • Base-scale corpora: 16GB uncompressed text from WIKIPEDIA and BOOKSCORPUS (Zhu et al., 2015), which is the original data for BERT. • Large-scale corpora: 160GB uncompressed text from WIKIPEDIA, BOOKSCORPUS, OPEN- WEBTEXT3, CC-NEWS (Liu et al., 2019) and STORIES (Trinh and Le, 2018), which is the original data used in RoBERTa.

Chinese Pre-training Data. We adopt the same Chinese text corpora used in ERNIE2.0 (Sun et al., 2020) to pre-train ERNIE-Gram.

Training Procedure

Preprocessing

The model authors note in the associated paper:

For pre-training on base-scale English corpora, the batch size is set to 256 sequences, the peak learning rate is 1e-4 for 1M training steps, which are the same settings as BERTBASE. As for large-scale English corpora, the batch size is 5112 sequences, the peak learning rate is 4e-4 for 500K training steps. For pre-training on Chinese corpora, the batch size is 256 sequences, the peak learning rate is 1e-4 for 3M training steps.

Speeds, Sizes, Times

More information needed

Evaluation

Testing Data, Factors & Metrics

Testing Data

More information needed

Factors

Metrics

More information needed

Results

Classification and matching use the CLUE dataset. CLUE evaluation results:

配置 模型 CLUEWSC2020 IFLYTEK TNEWS AFQMC CMNLI CSL OCNLI 平均值
20L1024H ERNIE 3.0-XBase 91.12 62.22 60.34 76.95 84.98 84.27 82.07 77.42
12L768H ERNIE 3.0-Base 88.18 60.72 58.73 76.53 83.65 83.30 80.31 75.63
6L768H ERNIE 3.0-Medium 79.93 60.14 57.16 74.56 80.87 81.23 77.02 72.99

Model Examination

More information needed

Environmental Impact

Carbon emissions can be estimated using the Machine Learning Impact calculator presented in Lacoste et al. (2019).

  • Hardware Type: More information needed
  • Hours used: More information needed
  • Cloud Provider: More information needed
  • Compute Region: More information needed
  • Carbon Emitted: More information needed

Technical Specifications [optional]

Model Architecture and Objective

More information needed

Compute Infrastructure

More information needed

Hardware

More information needed

Software

More information needed

Citation

BibTeX:

@article{xiao2020ernie,
 title={ERNIE-Gram: Pre-Training with Explicitly N-Gram Masked Language Modeling for Natural Language Understanding},
 author={Xiao, Dongling and Li, Yu-Kun and Zhang, Han and Sun, Yu and Tian, Hao and Wu, Hua and Wang, Haifeng},
 journal={arXiv preprint arXiv:2010.12148},
 year={2020}
}

Glossary [optional]

More information needed

More Information [optional]

More information needed

Model Card Authors [optional]

Peterchou in collaboration with Ezi Ozoani and the Hugging Face team

Model Card Contact

More information needed

How to Get Started with the Model

Use the code below to get started with the model.

Click to expand
from transformers import AutoModel
 
model = AutoModel.from_pretrained("peterchou/ernie-gram")