Delete en_tokenizer

Files changed (6) hide show

en_tokenizer/README.md DELETED Viewed

@@ -1,35 +0,0 @@
-# ERNIE-2.0-large
-## Introduction
-ERNIE 2.0 is a continual pre-training framework proposed by Baidu in 2019,
-which builds and learns incrementally pre-training tasks through constant multi-task learning.
-Experimental results demonstrate that ERNIE 2.0 outperforms BERT and XLNet on 16 tasks including English tasks on GLUE benchmarks and several common tasks in Chinese.
-More detail: https://arxiv.org/abs/1907.12412
-## Released Model Info
-This released pytorch model is converted from the officially released PaddlePaddle ERNIE model and
-a series of experiments have been conducted to check the accuracy of the conversion.
-- Official PaddlePaddle ERNIE repo: https://github.com/PaddlePaddle/ERNIE
-- Pytorch Conversion repo:  https://github.com/nghuyong/ERNIE-Pytorch
-## How to use
-```Python
-from transformers import AutoTokenizer, AutoModel
-tokenizer = AutoTokenizer.from_pretrained("nghuyong/ernie-2.0-large-en")
-model = AutoModel.from_pretrained("nghuyong/ernie-2.0-large-en")
-```
-## Citation
-```bibtex
-@article{sun2019ernie20,
-  title={ERNIE 2.0: A Continual Pre-training Framework for Language Understanding},
-  author={Sun, Yu and Wang, Shuohuan and Li, Yukun and Feng, Shikun and Tian, Hao and Wu, Hua and Wang, Haifeng},
-  journal={arXiv preprint arXiv:1907.12412},
-  year={2019}
-}
-```

en_tokenizer/config.json DELETED Viewed

@@ -1,19 +0,0 @@
-{
-    "attention_probs_dropout_prob": 0.1,
-    "intermediate_size": 4096,
-    "hidden_act": "gelu",
-    "hidden_dropout_prob": 0.1,
-    "hidden_size": 1024,
-    "initializer_range": 0.02,
-    "max_position_embeddings": 512,
-    "num_attention_heads": 16,
-    "num_hidden_layers": 24,
-    "type_vocab_size": 4,
-    "vocab_size": 30522,
-    "pad_token_id": 0,
-    "layer_norm_eps": 1e-05,
-    "model_type": "ernie",
-    "architectures": [
-        "ErnieModel"
-    ]
-}

en_tokenizer/pytorch_model.bin DELETED Viewed

@@ -1,3 +0,0 @@
-version https://git-lfs.github.com/spec/v1
-oid sha256:89ef99415c7bd5e585abdd19252a5e95ff42c361f4e6f8da823e989fd396395c
-size 1340701491

en_tokenizer/special_tokens_map.json DELETED Viewed

	@@ -1 +0,0 @@
1	- {"unk_token": "[UNK]", "sep_token": "[SEP]", "pad_token": "[PAD]", "cls_token": "[CLS]", "mask_token": "[MASK]"}

en_tokenizer/tokenizer_config.json DELETED Viewed

	@@ -1 +0,0 @@
1	- {"special_tokens_map_file": null, "full_tokenizer_file": null}

en_tokenizer/vocab.txt DELETED Viewed

The diff for this file is too large to render. See raw diff