straybird commited on
Commit
97aa6e7
1 Parent(s): 1dbd65f

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +22 -0
README.md CHANGED
@@ -1,3 +1,25 @@
1
  ---
2
  license: apache-2.0
3
  ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
  ---
2
  license: apache-2.0
3
  ---
4
+ ## Model Description
5
+
6
+ Erya is a pretrained model specifically designed for translating Ancient Chinese into Modern Chinese. It utilizes an Encoder-Decoder architecture and has been trained using a combination of DMLM (Dual Masked Language Model) and DAS (Disyllabic Aligned Substitution) techniques on datasets comprising both Ancient Chinese and Modern Chinese texts.
7
+
8
+ Erya has not undergone fine-tuning for the machine translation task, making it possible to further enhance its translation capabilities by fine-tuning on a smaller translation dataset. The more information about Ancient Chinese and Modern Chinese can be found here: [RUCAIBox/Erya-dataset · Datasets at Hugging Face](https://huggingface.co/datasets/RUCAIBox/Erya-dataset)
9
+
10
+
11
+
12
+ ## Example
13
+
14
+ ```python
15
+ from transformers import BertTokenizer, CPTForConditionalGeneration
16
+
17
+ tokenizer = BertTokenizer.from_pretrained("RUCAIBox/Erya")
18
+ model = CPTForConditionalGeneration.from_pretrained("RUCAIBox/Erya")
19
+
20
+ input_ids = tokenizer("安世字子孺,少以父任为郎。", return_tensors='pt')
21
+ input_ids.pop("token_type_ids")
22
+
23
+ pred_ids = model.generate(max_new_tokens=256, **input_ids)
24
+ print(tokenizer.batch_decode(pred_ids, skip_special_tokens=True))
25
+ ```