File size: 1,571 Bytes
8b6ce97
296083b
 
 
 
 
 
 
 
8b6ce97
 
296083b
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
---
language:
  - zh
thumbnail: https://ckip.iis.sinica.edu.tw/files/ckip_logo.png
tags:
  - pytorch
  - lm-head
  - bert
  - zh
license: gpl-3.0
---

# CKIP Oldhan BERT Base Chinese

Pretrained model on oldhan Chinese language using a masked language modeling (MLM) objective.

## Homepage
* [ckiplab/han-transformers](https://github.com/ckiplab/han-transformers)

## Training Datasets
The copyright of the datasets belongs to the Institute of Linguistics, Academia Sinica.
* [中央研究院上古漢語標記語料庫](http://lingcorpus.iis.sinica.edu.tw/cgi-bin/kiwi/akiwi/kiwi.sh?ukey=-406192123&qtype=-1)
* [中央研究院中古漢語語料庫](http://lingcorpus.iis.sinica.edu.tw/cgi-bin/kiwi/dkiwi/kiwi.sh?ukey=852967425&qtype=-1)
* [中央研究院近代漢語語料庫](http://lingcorpus.iis.sinica.edu.tw/cgi-bin/kiwi/pkiwi/kiwi.sh?ukey=-299696128&qtype=-1)
* [中央研究院現代漢語語料庫](http://lingcorpus.iis.sinica.edu.tw/cgi-bin/kiwi/mkiwi/kiwi.sh)

## Contributors
* Chin-Tung Lin at [CKIP](https://ckip.iis.sinica.edu.tw/)

## Usage

* Using our model in your script
    ```python
    from transformers import (
      AutoTokenizer,
      AutoModel,
    )

    tokenizer = AutoTokenizer.from_pretrained("ckiplab/oldhan-bert-base-chinese")
    model = AutoModel.from_pretrained("ckiplab/oldhan-bert-base-chinese")
    ```

* Using our model for inference
    ```python
    >>> from transformers import pipeline
    >>> unmasker = pipeline('fill-mask', model='ckiplab/oldhan-bert-base-chinese')
    >>> unmasker("黎民[MASK]變時雍")

    ```