KoichiYasuoka commited on
Commit
467e9af
1 Parent(s): 57e450a

initial release

Browse files
Files changed (1) hide show
  1. README.md +32 -0
README.md ADDED
@@ -0,0 +1,32 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ language:
3
+ - "lzh"
4
+ tags:
5
+ - "classical chinese"
6
+ - "literary chinese"
7
+ - "ancient chinese"
8
+ - "sentence segmentation"
9
+ license: "apache-2.0"
10
+ pipeline_tag: "token-classification"
11
+ widget:
12
+ - text: "子曰學而時習之不亦說乎有朋自遠方來不亦樂乎人不知而不慍不亦君子乎"
13
+ ---
14
+
15
+ # roberta-classical-chinese-base-sentence-segmentation
16
+
17
+ ## Model Description
18
+
19
+ This is a RoBERTa model pre-trained on Classical Chinese texts for sentence segmentation, derived from [roberta-classical-chinese-base-char](https://huggingface.co/KoichiYasuoka/roberta-classical-chinese-base-char).
20
+
21
+ ## How to Use
22
+
23
+ ```py
24
+ import torch
25
+ from transformers import AutoTokenizer,AutoModelForTokenClassification
26
+ tokenizer=AutoTokenizer.from_pretrained("KoichiYasuoka/roberta-classical-chinese-base-sentence-segmentation")
27
+ model=AutoModelForMaskedLM.from_pretrained("KoichiYasuoka/roberta-classical-chinese-base-sentence-segmentation")
28
+ s="子曰學而時習之不亦說乎有朋自遠方來不亦樂乎人不知而不慍不亦君子乎"
29
+ p=[model.config.id2label[q] for q in torch.argmax(model(tokenizer.encode(s,return_tensors="pt"))[0],dim=2)[0].tolist()[1:-1]]
30
+ print("".join(c+"。" if q=="E" or q=="S" else c for c,q in zip(s,p)))
31
+ ```
32
+