metadata

language:
  - zh
thumbnail: >-
  https://raw.githubusercontent.com/SIKU-BERT/SikuBERT/main/appendix/sikubert.png
tags:
  - chinese
  - classical chinese
  - literary chinese
  - ancient chinese
  - bert
  - roberta
  - pytorch
inference: false
license: apache-2.0

SikuBERT

Model description

Digital humanities research needs the support of large-scale corpus and high-performance ancient Chinese natural language processing tools. The pre-training language model has greatly improved the accuracy of text mining in English and modern Chinese texts. At present, there is an urgent need for a pre-training model specifically for the automatic processing of ancient texts. We used the verified high-quality “Siku Quanshu” full-text corpus as the training set, based on the BERT deep language model architecture, we constructed the SikuBERT and SikuRoBERTa pre-training language models for intelligent processing tasks of ancient Chinese.

How to use

from transformers import AutoTokenizer, AutoModel
tokenizer = AutoTokenizer.from_pretrained("SIKU-BERT/sikubert")
model = AutoModel.from_pretrained("SIKU-BERT/sikubert")

About Us

We are from Nanjing Agricultural University.

Created with by SIKU-BERT