Papers
arxiv:1906.08101

Pre-Training with Whole Word Masking for Chinese BERT

Published on Jun 19, 2019
Authors:
,
,
,
,

Abstract

Bidirectional Encoder Representations from Transformers (BERT) has shown marvelous improvements across various NLP tasks, and its consecutive variants have been proposed to further improve the performance of the pre-trained language models. In this paper, we aim to first introduce the whole word masking (wwm) strategy for Chinese BERT, along with a series of Chinese pre-trained language models. Then we also propose a simple but effective model called Mac<PRE_TAG>BERT</POST_TAG>, which improves upon Ro<PRE_TAG>BERTa</POST_TAG> in several ways. Especially, we propose a new masking strategy called MLM as correction (Mac). To demonstrate the effectiveness of these models, we create a series of Chinese pre-trained language models as our baselines, including BERT, Ro<PRE_TAG>BERTa</POST_TAG>, ELECTRA, RBT, etc. We carried out extensive experiments on ten Chinese NLP tasks to evaluate the created Chinese pre-trained language models as well as the proposed Mac<PRE_TAG>BERT</POST_TAG>. Experimental results show that Mac<PRE_TAG>BERT</POST_TAG> could achieve state-of-the-art performances on many NLP tasks, and we also ablate details with several findings that may help future research. We open-source our pre-trained language models for further facilitating our research community. Resources are available: https://github.com/ymcui/Chinese-BERT-wwm

Community

Sign up or log in to comment

Models citing this paper 11

Browse 11 models citing this paper

Datasets citing this paper 0

No dataset linking this paper

Cite arxiv.org/abs/1906.08101 in a dataset README.md to link it from this page.

Spaces citing this paper 253

Collections including this paper 0

No Collection including this paper

Add this paper to a collection to link it from this page.