bayartsogt commited on
Commit
9c1295f
1 Parent(s): 07e4d96

Create README.md

Browse files
Files changed (1) hide show
  1. README.md +81 -0
README.md ADDED
@@ -0,0 +1,81 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # StructBERT: Incorporating Language Structures into Pre-training for Deep Language Understanding
2
+
3
+ Official Repository Link: https://github.com/alibaba/AliceMind/tree/main/StructBERT
4
+
5
+ [https://arxiv.org/abs/1908.04577](https://arxiv.org/abs/1908.04577)
6
+
7
+ ## Introduction
8
+ We extend BERT to a new model, StructBERT, by incorporating language structures into pre-training.
9
+ Specifically, we pre-train StructBERT with two auxiliary tasks to make the most of the sequential
10
+ order of words and sentences, which leverage language structures at the word and sentence levels,
11
+ respectively.
12
+ ## Pre-trained models
13
+ |Model | Description | #params | Download |
14
+ |------------------------|-------------------------------------------|------|------|
15
+ |structbert.en.large | StructBERT using the BERT-large architecture | 340M | [structbert.en.large](https://alice-open.oss-cn-zhangjiakou.aliyuncs.com/StructBERT/en_model) |
16
+ |structroberta.en.large | StructRoBERTa continue training from RoBERTa | 355M | Coming soon |
17
+ |structbert.ch.large | Chinese StructBERT; BERT-large architecture | 330M | [structbert.ch.large](https://alice-open.oss-cn-zhangjiakou.aliyuncs.com/StructBERT/ch_model) |
18
+
19
+ ## Results
20
+ The results of GLUE & CLUE tasks can be reproduced using the hyperparameters listed in the following "Example usage" section.
21
+ #### structbert.en.large
22
+ [GLUE benchmark](https://gluebenchmark.com/leaderboard)
23
+
24
+ |Model| MNLI | QNLIv2 | QQP | SST-2 | MRPC |
25
+ |--------------------|-------|-------|-------|-------|-------|
26
+ |structbert.en.large |86.86% |93.04% |91.67% |93.23% |86.51% |
27
+ #### structbert.ch.large
28
+ [CLUE benchmark](https://www.cluebenchmarks.com/)
29
+
30
+ |Model | CMNLI | OCNLI | TNEWS | AFQMC |
31
+ |--------------------|-------|-------|-------|-------|
32
+ |structbert.ch.large |84.47% |81.28% |68.67% |76.11% |
33
+
34
+ ## Example usage
35
+ #### Requirements and Installation
36
+ * [PyTorch](https://pytorch.org/) version >= 1.0.1
37
+
38
+ * Install other libraries via
39
+ ```
40
+ pip install -r requirements.txt
41
+ ```
42
+
43
+ * For faster training install NVIDIA's [apex](https://github.com/NVIDIA/apex) library
44
+
45
+ #### Finetune MNLI
46
+
47
+ ```
48
+ python run_classifier_multi_task.py \
49
+ --task_name MNLI \
50
+ --do_train \
51
+ --do_eval \
52
+ --do_test \
53
+ --amp_type O1 \
54
+ --lr_decay_factor 1 \
55
+ --dropout 0.1 \
56
+ --do_lower_case \
57
+ --detach_index -1 \
58
+ --core_encoder bert \
59
+ --data_dir path_to_glue_data \
60
+ --vocab_file config/vocab.txt \
61
+ --bert_config_file config/large_bert_config.json \
62
+ --init_checkpoint path_to_pretrained_model \
63
+ --max_seq_length 128 \
64
+ --train_batch_size 32 \
65
+ --learning_rate 2e-5 \
66
+ --num_train_epochs 3 \
67
+ --fast_train \
68
+ --gradient_accumulation_steps 1 \
69
+ --output_dir path_to_output_dir
70
+ ```
71
+
72
+ ## Citation
73
+ If you use our work, please cite:
74
+ ```
75
+ @article{wang2019structbert,
76
+ title={Structbert: Incorporating language structures into pre-training for deep language understanding},
77
+ author={Wang, Wei and Bi, Bin and Yan, Ming and Wu, Chen and Bao, Zuyi and Xia, Jiangnan and Peng, Liwei and Si, Luo},
78
+ journal={arXiv preprint arXiv:1908.04577},
79
+ year={2019}
80
+ }
81
+ ```