Upload 6 files

Files changed (6) hide show

en_tokenizer/README.md ADDED Viewed

+# ERNIE-2.0-large
+## Introduction
+ERNIE 2.0 is a continual pre-training framework proposed by Baidu in 2019,
+which builds and learns incrementally pre-training tasks through constant multi-task learning.
+Experimental results demonstrate that ERNIE 2.0 outperforms BERT and XLNet on 16 tasks including English tasks on GLUE benchmarks and several common tasks in Chinese.
+More detail: https://arxiv.org/abs/1907.12412
+## Released Model Info
+This released pytorch model is converted from the officially released PaddlePaddle ERNIE model and
+a series of experiments have been conducted to check the accuracy of the conversion.
+- Official PaddlePaddle ERNIE repo: https://github.com/PaddlePaddle/ERNIE
+- Pytorch Conversion repo:  https://github.com/nghuyong/ERNIE-Pytorch
+## How to use
+```Python
+from transformers import AutoTokenizer, AutoModel
+tokenizer = AutoTokenizer.from_pretrained("nghuyong/ernie-2.0-large-en")
+model = AutoModel.from_pretrained("nghuyong/ernie-2.0-large-en")
+```
+## Citation
+```bibtex
+@article{sun2019ernie20,
+  title={ERNIE 2.0: A Continual Pre-training Framework for Language Understanding},
+  author={Sun, Yu and Wang, Shuohuan and Li, Yukun and Feng, Shikun and Tian, Hao and Wu, Hua and Wang, Haifeng},
+  journal={arXiv preprint arXiv:1907.12412},
+  year={2019}
+}
+```

en_tokenizer/config.json ADDED Viewed

+{
+    "attention_probs_dropout_prob": 0.1,
+    "intermediate_size": 4096,
+    "hidden_act": "gelu",
+    "hidden_dropout_prob": 0.1,
+    "hidden_size": 1024,
+    "initializer_range": 0.02,
+    "max_position_embeddings": 512,
+    "num_attention_heads": 16,
+    "num_hidden_layers": 24,
+    "type_vocab_size": 4,
+    "vocab_size": 30522,
+    "pad_token_id": 0,
+    "layer_norm_eps": 1e-05,
+    "model_type": "ernie",
+    "architectures": [
+        "ErnieModel"
+    ]
+}

en_tokenizer/pytorch_model.bin ADDED Viewed

+version https://git-lfs.github.com/spec/v1
+oid sha256:89ef99415c7bd5e585abdd19252a5e95ff42c361f4e6f8da823e989fd396395c
+size 1340701491

en_tokenizer/special_tokens_map.json ADDED Viewed

	@@ -0,0 +1 @@


1	+ {"unk_token": "[UNK]", "sep_token": "[SEP]", "pad_token": "[PAD]", "cls_token": "[CLS]", "mask_token": "[MASK]"}

en_tokenizer/tokenizer_config.json ADDED Viewed

	@@ -0,0 +1 @@


1	+ {"special_tokens_map_file": null, "full_tokenizer_file": null}

en_tokenizer/vocab.txt ADDED Viewed

The diff for this file is too large to render. See raw diff