File size: 3,995 Bytes
b401f24
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
---
language: zh
tags:
- structbert
- pytorch
- tf2.0
inference: False
---

# StructBERT: Un-Official Copy

Official Repository Link: https://github.com/alibaba/AliceMind/tree/main/StructBERT

**Claimer** 
* This model card is not produced by [AliceMind Team](https://github.com/alibaba/AliceMind/)

## Reproduce HFHub models:
Download model/tokenizer vocab
```bash
wget https://raw.githubusercontent.com/alibaba/AliceMind/main/StructBERT/config/ch_large_bert_config.json && mv ch_large_bert_config.json config.json
wget https://raw.githubusercontent.com/alibaba/AliceMind/main/StructBERT/config/ch_vocab.txt
wget https://alice-open.oss-cn-zhangjiakou.aliyuncs.com/StructBERT/ch_model && mv ch_model pytorch_model.bin
```

```python
from transformers import BertConfig, BertModel, BertTokenizer
config = BertConfig.from_pretrained("./config.json")
model = BertModel.from_pretrained("./", config=config)
tokenizer = BertTokenizer.from_pretrained("./")
model.push_to_hub("structbert-large-zh")
tokenizer.push_to_hub("structbert-large-zh")
```

[https://arxiv.org/abs/1908.04577](https://arxiv.org/abs/1908.04577)

# StructBERT: Incorporating Language Structures into Pre-training for Deep Language Understanding
## Introduction
We extend BERT to a new model, StructBERT, by incorporating language structures into pre-training. 
Specifically, we pre-train StructBERT with two auxiliary tasks to make the most of the sequential 
order of words and sentences, which leverage language structures at the word and sentence levels, 
respectively.
## Pre-trained models
|Model | Description | #params | Download |
|------------------------|-------------------------------------------|------|------|
|structbert.en.large | StructBERT using the BERT-large architecture | 340M | [structbert.en.large](https://alice-open.oss-cn-zhangjiakou.aliyuncs.com/StructBERT/en_model) |
|structroberta.en.large | StructRoBERTa continue training from RoBERTa | 355M | Coming soon |
|structbert.ch.large | Chinese StructBERT; BERT-large architecture | 330M | [structbert.ch.large](https://alice-open.oss-cn-zhangjiakou.aliyuncs.com/StructBERT/ch_model) |

## Results
The results of GLUE & CLUE tasks can be reproduced using the hyperparameters listed in the following "Example usage" section.
#### structbert.en.large
[GLUE benchmark](https://gluebenchmark.com/leaderboard)

|Model| MNLI | QNLIv2 | QQP | SST-2 | MRPC | 
|--------------------|-------|-------|-------|-------|-------|
|structbert.en.large |86.86% |93.04% |91.67% |93.23% |86.51% |
#### structbert.ch.large
[CLUE benchmark](https://www.cluebenchmarks.com/)

|Model | CMNLI | OCNLI | TNEWS | AFQMC |
|--------------------|-------|-------|-------|-------|
|structbert.ch.large |84.47% |81.28% |68.67% |76.11% | 

## Example usage
#### Requirements and Installation
* [PyTorch](https://pytorch.org/) version >= 1.0.1

* Install other libraries via
```
pip install -r requirements.txt
```

* For faster training install NVIDIA's [apex](https://github.com/NVIDIA/apex) library

#### Finetune MNLI

```
python run_classifier_multi_task.py \
  --task_name MNLI \
  --do_train \
  --do_eval \
  --do_test \
  --amp_type O1 \
  --lr_decay_factor 1 \
  --dropout 0.1 \
  --do_lower_case \
  --detach_index -1 \
  --core_encoder bert \
  --data_dir path_to_glue_data \
  --vocab_file config/vocab.txt \
  --bert_config_file config/large_bert_config.json \
  --init_checkpoint path_to_pretrained_model \
  --max_seq_length 128 \
  --train_batch_size 32 \
  --learning_rate 2e-5 \
  --num_train_epochs 3 \
  --fast_train \
  --gradient_accumulation_steps 1 \
  --output_dir path_to_output_dir 
```

## Citation
If you use our work, please cite:
```
@article{wang2019structbert,
  title={Structbert: Incorporating language structures into pre-training for deep language understanding},
  author={Wang, Wei and Bi, Bin and Yan, Ming and Wu, Chen and Bao, Zuyi and Xia, Jiangnan and Peng, Liwei and Si, Luo},
  journal={arXiv preprint arXiv:1908.04577},
  year={2019}
}
```