--- language: - zh thumbnail: https://ckip.iis.sinica.edu.tw/files/ckip_logo.png tags: - pytorch - lm-head - bert - zh license: gpl-3.0 --- # CKIP Oldhan BERT Base Chinese Pretrained model on oldhan Chinese language using a masked language modeling (MLM) objective. ## Homepage * [ckiplab/han-transformers](https://github.com/ckiplab/han-transformers) ## Training Datasets The copyright of the datasets belongs to the Institute of Linguistics, Academia Sinica. * [中央研究院上古漢語標記語料庫](http://lingcorpus.iis.sinica.edu.tw/cgi-bin/kiwi/akiwi/kiwi.sh?ukey=-406192123&qtype=-1) * [中央研究院中古漢語語料庫](http://lingcorpus.iis.sinica.edu.tw/cgi-bin/kiwi/dkiwi/kiwi.sh?ukey=852967425&qtype=-1) * [中央研究院近代漢語語料庫](http://lingcorpus.iis.sinica.edu.tw/cgi-bin/kiwi/pkiwi/kiwi.sh?ukey=-299696128&qtype=-1) * [中央研究院現代漢語語料庫](http://lingcorpus.iis.sinica.edu.tw/cgi-bin/kiwi/mkiwi/kiwi.sh) ## Contributors * Chin-Tung Lin at [CKIP](https://ckip.iis.sinica.edu.tw/) ## Usage * Using our model in your script ```python from transformers import ( AutoTokenizer, AutoModel, ) tokenizer = AutoTokenizer.from_pretrained("ckiplab/oldhan-bert-base-chinese") model = AutoModel.from_pretrained("ckiplab/oldhan-bert-base-chinese") ``` * Using our model for inference ```python >>> from transformers import pipeline >>> unmasker = pipeline('fill-mask', model='ckiplab/oldhan-bert-base-chinese') >>> unmasker("黎民[MASK]變時雍") ```