TITO (Classical Chinese Office Title Translation)

Open In Colab

Our model TITO (Classical Chinese Office Title Translation) is a Sequence to Sequence Classical Chinese language model that is intended to translate a Classical Chinese office title into English. This model is first inherited from the MarianMTModel, and finetuned using a 6,208 high-quality translation pairs collected CBDB group (China Biographical Database).

How to use

Here is how to use this model to get the features of a given text in PyTorch:

1. Import model and packages

from transformers import MarianMTModel, MarianTokenizer

device = torch.device('cuda')
model_name = 'cbdb/ClassicalChineseOfficeTitleTranslation'
tokenizer = MarianTokenizer.from_pretrained(model_name)
model = MarianMTModel.from_pretrained(model_name).to(device)

2. Load Data

# Load your data here
tobe_translated = ['講筵官','判司簿尉','散騎常侍','殿中省尚輦奉御']

3. Make a prediction

inputs = tokenizer(tobe_translated, return_tensors="pt", padding=True).to(device)
translated = model.generate(**inputs, max_length=128)
tran = [tokenizer.decode(t, skip_special_tokens=True) for t in translated]
for c, t in zip(tobe_translated, tran):
    print(f'{c}: {t}')

講筵官: Lecturer
判司簿尉: Supervisor of the Commandant of Records
散騎常侍: Policy Advisor
殿中省尚輦奉御: Chief Steward of the Palace Administration

Authors

Queenie Luo (queenieluo[at]g.harvard.edu)
Hongsu Wang
Peter Bol
CBDB Group

License

Copyright (c) 2023 CBDB

Except where otherwise noted, content on this repository is licensed under the Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License (CC BY-NC-SA 4.0). To view a copy of this license, visit http://creativecommons.org/licenses/by-nc-sa/4.0/ or send a letter to Creative Commons, PO Box 1866, Mountain View, CA 94042, USA.

Downloads last month
11
Inference Examples
This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social visibility and check back later, or deploy to Inference Endpoints (dedicated) instead.