junnyu commited on
Commit
b401f24
1 Parent(s): 512e25c

add readme

Browse files
Files changed (1) hide show
  1. README.md +111 -0
README.md ADDED
@@ -0,0 +1,111 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ language: zh
3
+ tags:
4
+ - structbert
5
+ - pytorch
6
+ - tf2.0
7
+ inference: False
8
+ ---
9
+
10
+ # StructBERT: Un-Official Copy
11
+
12
+ Official Repository Link: https://github.com/alibaba/AliceMind/tree/main/StructBERT
13
+
14
+ **Claimer**
15
+ * This model card is not produced by [AliceMind Team](https://github.com/alibaba/AliceMind/)
16
+
17
+ ## Reproduce HFHub models:
18
+ Download model/tokenizer vocab
19
+ ```bash
20
+ wget https://raw.githubusercontent.com/alibaba/AliceMind/main/StructBERT/config/ch_large_bert_config.json && mv ch_large_bert_config.json config.json
21
+ wget https://raw.githubusercontent.com/alibaba/AliceMind/main/StructBERT/config/ch_vocab.txt
22
+ wget https://alice-open.oss-cn-zhangjiakou.aliyuncs.com/StructBERT/ch_model && mv ch_model pytorch_model.bin
23
+ ```
24
+
25
+ ```python
26
+ from transformers import BertConfig, BertModel, BertTokenizer
27
+ config = BertConfig.from_pretrained("./config.json")
28
+ model = BertModel.from_pretrained("./", config=config)
29
+ tokenizer = BertTokenizer.from_pretrained("./")
30
+ model.push_to_hub("structbert-large-zh")
31
+ tokenizer.push_to_hub("structbert-large-zh")
32
+ ```
33
+
34
+ [https://arxiv.org/abs/1908.04577](https://arxiv.org/abs/1908.04577)
35
+
36
+ # StructBERT: Incorporating Language Structures into Pre-training for Deep Language Understanding
37
+ ## Introduction
38
+ We extend BERT to a new model, StructBERT, by incorporating language structures into pre-training.
39
+ Specifically, we pre-train StructBERT with two auxiliary tasks to make the most of the sequential
40
+ order of words and sentences, which leverage language structures at the word and sentence levels,
41
+ respectively.
42
+ ## Pre-trained models
43
+ |Model | Description | #params | Download |
44
+ |------------------------|-------------------------------------------|------|------|
45
+ |structbert.en.large | StructBERT using the BERT-large architecture | 340M | [structbert.en.large](https://alice-open.oss-cn-zhangjiakou.aliyuncs.com/StructBERT/en_model) |
46
+ |structroberta.en.large | StructRoBERTa continue training from RoBERTa | 355M | Coming soon |
47
+ |structbert.ch.large | Chinese StructBERT; BERT-large architecture | 330M | [structbert.ch.large](https://alice-open.oss-cn-zhangjiakou.aliyuncs.com/StructBERT/ch_model) |
48
+
49
+ ## Results
50
+ The results of GLUE & CLUE tasks can be reproduced using the hyperparameters listed in the following "Example usage" section.
51
+ #### structbert.en.large
52
+ [GLUE benchmark](https://gluebenchmark.com/leaderboard)
53
+
54
+ |Model| MNLI | QNLIv2 | QQP | SST-2 | MRPC |
55
+ |--------------------|-------|-------|-------|-------|-------|
56
+ |structbert.en.large |86.86% |93.04% |91.67% |93.23% |86.51% |
57
+ #### structbert.ch.large
58
+ [CLUE benchmark](https://www.cluebenchmarks.com/)
59
+
60
+ |Model | CMNLI | OCNLI | TNEWS | AFQMC |
61
+ |--------------------|-------|-------|-------|-------|
62
+ |structbert.ch.large |84.47% |81.28% |68.67% |76.11% |
63
+
64
+ ## Example usage
65
+ #### Requirements and Installation
66
+ * [PyTorch](https://pytorch.org/) version >= 1.0.1
67
+
68
+ * Install other libraries via
69
+ ```
70
+ pip install -r requirements.txt
71
+ ```
72
+
73
+ * For faster training install NVIDIA's [apex](https://github.com/NVIDIA/apex) library
74
+
75
+ #### Finetune MNLI
76
+
77
+ ```
78
+ python run_classifier_multi_task.py \
79
+ --task_name MNLI \
80
+ --do_train \
81
+ --do_eval \
82
+ --do_test \
83
+ --amp_type O1 \
84
+ --lr_decay_factor 1 \
85
+ --dropout 0.1 \
86
+ --do_lower_case \
87
+ --detach_index -1 \
88
+ --core_encoder bert \
89
+ --data_dir path_to_glue_data \
90
+ --vocab_file config/vocab.txt \
91
+ --bert_config_file config/large_bert_config.json \
92
+ --init_checkpoint path_to_pretrained_model \
93
+ --max_seq_length 128 \
94
+ --train_batch_size 32 \
95
+ --learning_rate 2e-5 \
96
+ --num_train_epochs 3 \
97
+ --fast_train \
98
+ --gradient_accumulation_steps 1 \
99
+ --output_dir path_to_output_dir
100
+ ```
101
+
102
+ ## Citation
103
+ If you use our work, please cite:
104
+ ```
105
+ @article{wang2019structbert,
106
+ title={Structbert: Incorporating language structures into pre-training for deep language understanding},
107
+ author={Wang, Wei and Bi, Bin and Yan, Ming and Wu, Chen and Bao, Zuyi and Xia, Jiangnan and Peng, Liwei and Si, Luo},
108
+ journal={arXiv preprint arXiv:1908.04577},
109
+ year={2019}
110
+ }
111
+ ```