add readme

b401f24 almost 2 years ago

No virus

4 kB

	---
	language: zh
	tags:
	- structbert
	- pytorch
	- tf2.0
	inference: False
	---

	# StructBERT: Un-Official Copy

	Official Repository Link: https://github.com/alibaba/AliceMind/tree/main/StructBERT

	Claimer
	* This model card is not produced by [AliceMind Team](https://github.com/alibaba/AliceMind/)

	## Reproduce HFHub models:
	Download model/tokenizer vocab
	```bash
	wget https://raw.githubusercontent.com/alibaba/AliceMind/main/StructBERT/config/ch_large_bert_config.json && mv ch_large_bert_config.json config.json
	wget https://raw.githubusercontent.com/alibaba/AliceMind/main/StructBERT/config/ch_vocab.txt
	wget https://alice-open.oss-cn-zhangjiakou.aliyuncs.com/StructBERT/ch_model && mv ch_model pytorch_model.bin
	```

	```python
	from transformers import BertConfig, BertModel, BertTokenizer
	config = BertConfig.from_pretrained("./config.json")
	model = BertModel.from_pretrained("./", config=config)
	tokenizer = BertTokenizer.from_pretrained("./")
	model.push_to_hub("structbert-large-zh")
	tokenizer.push_to_hub("structbert-large-zh")
	```

	[https://arxiv.org/abs/1908.04577](https://arxiv.org/abs/1908.04577)

	# StructBERT: Incorporating Language Structures into Pre-training for Deep Language Understanding
	## Introduction
	We extend BERT to a new model, StructBERT, by incorporating language structures into pre-training.
	Specifically, we pre-train StructBERT with two auxiliary tasks to make the most of the sequential
	order of words and sentences, which leverage language structures at the word and sentence levels,
	respectively.
	## Pre-trained models
	\|Model \| Description \| #params \| Download \|
	\|------------------------\|-------------------------------------------\|------\|------\|
	\|structbert.en.large \| StructBERT using the BERT-large architecture \| 340M \| [structbert.en.large](https://alice-open.oss-cn-zhangjiakou.aliyuncs.com/StructBERT/en_model) \|
	\|structroberta.en.large \| StructRoBERTa continue training from RoBERTa \| 355M \| Coming soon \|
	\|structbert.ch.large \| Chinese StructBERT; BERT-large architecture \| 330M \| [structbert.ch.large](https://alice-open.oss-cn-zhangjiakou.aliyuncs.com/StructBERT/ch_model) \|

	## Results
	The results of GLUE & CLUE tasks can be reproduced using the hyperparameters listed in the following "Example usage" section.
	#### structbert.en.large
	[GLUE benchmark](https://gluebenchmark.com/leaderboard)

	\|Model\| MNLI \| QNLIv2 \| QQP \| SST-2 \| MRPC \|
	\|--------------------\|-------\|-------\|-------\|-------\|-------\|
	\|structbert.en.large \|86.86% \|93.04% \|91.67% \|93.23% \|86.51% \|
	#### structbert.ch.large
	[CLUE benchmark](https://www.cluebenchmarks.com/)

	\|Model \| CMNLI \| OCNLI \| TNEWS \| AFQMC \|
	\|--------------------\|-------\|-------\|-------\|-------\|
	\|structbert.ch.large \|84.47% \|81.28% \|68.67% \|76.11% \|

	## Example usage
	#### Requirements and Installation
	* [PyTorch](https://pytorch.org/) version >= 1.0.1

	* Install other libraries via
	```
	pip install -r requirements.txt
	```

	* For faster training install NVIDIA's [apex](https://github.com/NVIDIA/apex) library

	#### Finetune MNLI

	```
	python run_classifier_multi_task.py \
	--task_name MNLI \
	--do_train \
	--do_eval \
	--do_test \
	--amp_type O1 \
	--lr_decay_factor 1 \
	--dropout 0.1 \
	--do_lower_case \
	--detach_index -1 \
	--core_encoder bert \
	--data_dir path_to_glue_data \
	--vocab_file config/vocab.txt \
	--bert_config_file config/large_bert_config.json \
	--init_checkpoint path_to_pretrained_model \
	--max_seq_length 128 \
	--train_batch_size 32 \
	--learning_rate 2e-5 \
	--num_train_epochs 3 \
	--fast_train \
	--gradient_accumulation_steps 1 \
	--output_dir path_to_output_dir
	```

	## Citation
	If you use our work, please cite:
	```
	@article{wang2019structbert,
	title={Structbert: Incorporating language structures into pre-training for deep language understanding},
	author={Wang, Wei and Bi, Bin and Yan, Ming and Wu, Chen and Bao, Zuyi and Xia, Jiangnan and Peng, Liwei and Si, Luo},
	journal={arXiv preprint arXiv:1908.04577},
	year={2019}
	}
	```