junnyu commited on
Commit
072e3cd
1 Parent(s): c163157
Files changed (6) hide show
  1. README.md +38 -0
  2. config.json +21 -0
  3. pytorch_model.bin +3 -0
  4. tokenizer.json +0 -0
  5. tokenizer_config.json +4 -0
  6. vocab.txt +0 -0
README.md ADDED
@@ -0,0 +1,38 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ language: "en"
3
+ thumbnail: "https://github.com/junnyu"
4
+ tags:
5
+ - pytorch
6
+ - electra
7
+ license: "MIT"
8
+ datasets:
9
+ - openwebtext
10
+
11
+ ---
12
+ # 一、 个人在openwebtext数据集上训练得到的electra-small模型
13
+
14
+ # 二、 复现结果(dev dataset)
15
+ |Model|CoLA|SST|MRPC|STS|QQP|MNLI|QNLI|RTE|Avg.|
16
+ |---|---|---|---|---|---|---|---|---|---|
17
+ |ELECTRA-Small-OWT(original)|56.8|88.3|87.4|86.8|88.3|78.9|87.9|68.5|80.36|
18
+ |**ELECTRA-Small-OWT (this)**| 55.82 |89.67|87.0|86.96|89.28|80.08|87.50|66.07|80.30|
19
+
20
+ # 三、 训练细节
21
+ - 数据集 openwebtext
22
+ - 训练batch_size 256
23
+ - 学习率lr 5e-4
24
+ - 最大句子长度max_seqlen 128
25
+ - 训练total step 62.5W
26
+ - GPU RTX3090
27
+ - 训练时间总共耗费2.5天
28
+
29
+ # 四、 使用
30
+ ```python
31
+ import torch
32
+ from transformers.models.electra import ElectraModel, ElectraTokenizer
33
+ tokenizer = ElectraTokenizer.from_pretrained("junnyu/electra_small_generator")
34
+ model = ElectraModel.from_pretrained("junnyu/electra_small_generator")
35
+ inputs = tokenizer("Beijing is the capital of [MASK].", return_tensors="pt")
36
+ with torch.no_grad():
37
+ outputs = model(**inputs)
38
+ ```
config.json ADDED
@@ -0,0 +1,21 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "architectures": [
3
+ "ElectraForMaskedLM"
4
+ ],
5
+ "attention_probs_dropout_prob": 0.1,
6
+ "embedding_size": 128,
7
+ "hidden_act": "gelu",
8
+ "hidden_dropout_prob": 0.1,
9
+ "hidden_size": 64,
10
+ "initializer_range": 0.02,
11
+ "intermediate_size": 256,
12
+ "layer_norm_eps": 1e-12,
13
+ "max_position_embeddings": 512,
14
+ "model_type": "electra",
15
+ "num_attention_heads": 4,
16
+ "num_hidden_layers": 12,
17
+ "pad_token_id": 0,
18
+ "position_embedding_type": "absolute",
19
+ "type_vocab_size": 2,
20
+ "vocab_size": 30522
21
+ }
pytorch_model.bin ADDED
@@ -0,0 +1,3 @@
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:18acdae4d103a64f0b70d551f0638fdeb846f15519bff584f898508bd4acb532
3
+ size 54273423
tokenizer.json ADDED
The diff for this file is too large to render. See raw diff
tokenizer_config.json ADDED
@@ -0,0 +1,4 @@
 
 
 
 
1
+ {
2
+ "do_lower_case": true
3
+ }
4
+
vocab.txt ADDED
The diff for this file is too large to render. See raw diff