File size: 1,595 Bytes
7f8dca3 a24591e 7f8dca3 a11c756 7f8dca3 a11c756 |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 |
---
language:
- "zh"
thumbnail: "https://raw.githubusercontent.com/SIKU-BERT/SikuBERT/main/appendix/sikubert.png"
tags:
- "chinese"
- "classical chinese"
- "literary chinese"
- "ancient chinese"
- "bert"
- "roberta"
- "pytorch"
inference: false
license: "apache-2.0"
---
# SikuBERT
## Model description
![SikuBERT](https://raw.githubusercontent.com/SIKU-BERT/SikuBERT-for-digital-humanities-and-classical-Chinese-information-processing/main/appendix/sikubert.png)
Digital humanities research needs the support of large-scale corpus and high-performance ancient Chinese natural language processing tools. The pre-training language model has greatly improved the accuracy of text mining in English and modern Chinese texts. At present, there is an urgent need for a pre-training model specifically for the automatic processing of ancient texts. We used the verified high-quality “Siku Quanshu” full-text corpus as the training set, based on the BERT deep language model architecture, we constructed the SikuBERT and SikuRoBERTa pre-training language models for intelligent processing tasks of ancient Chinese.
## How to use
```python
from transformers import AutoTokenizer, AutoModel
tokenizer = AutoTokenizer.from_pretrained("SIKU-BERT/sikubert")
model = AutoModel.from_pretrained("SIKU-BERT/sikubert")
```
## About Us
We are from Nanjing Agricultural University.
> Created with by SIKU-BERT [![Github icon](https://cdn0.iconfinder.com/data/icons/octicons/1024/mark-github-32.png)](https://github.com/SIKU-BERT/SikuBERT-for-digital-humanities-and-classical-Chinese-information-processing) |