RUCAIBox
/

Erya

 ---
 license: apache-2.0
 ---
+## Model Description
+Erya is a pretrained model specifically designed for translating Ancient Chinese into Modern Chinese. It utilizes an Encoder-Decoder architecture and has been trained using a combination of DMLM (Dual Masked Language Model) and DAS (Disyllabic Aligned Substitution) techniques on datasets comprising both Ancient Chinese and Modern Chinese texts.
+Erya has not undergone fine-tuning for the machine translation task, making it possible to further enhance its translation capabilities by fine-tuning on a smaller translation dataset. The more information about Ancient Chinese and Modern Chinese can be found here: [RUCAIBox/Erya-dataset · Datasets at Hugging Face](https://huggingface.co/datasets/RUCAIBox/Erya-dataset)
+## Example
+```python
+from transformers import BertTokenizer, CPTForConditionalGeneration
+tokenizer = BertTokenizer.from_pretrained("RUCAIBox/Erya")
+model = CPTForConditionalGeneration.from_pretrained("RUCAIBox/Erya")
+input_ids = tokenizer("安世字子孺，少以父任为郎。", return_tensors='pt')
+input_ids.pop("token_type_ids")
+pred_ids = model.generate(max_new_tokens=256, **input_ids)
+print(tokenizer.batch_decode(pred_ids, skip_special_tokens=True))
+```