Translation
Transformers
Safetensors
Chinese
Vietnamese
marian
text2text-generation
chinese
vietnamese
zh-vi
chinese-vietnamese
marianmt
machine-translation
Instructions to use DanVP/MoxhiMT-60 with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use DanVP/MoxhiMT-60 with Transformers:
# Use a pipeline as a high-level helper # Warning: Pipeline type "translation" is no longer supported in transformers v5. # You must load the model directly (see below) or downgrade to v4.x with: # 'pip install "transformers<5.0.0' from transformers import pipeline pipe = pipeline("translation", model="DanVP/MoxhiMT-60")# Load model directly from transformers import AutoTokenizer, AutoModelForMultimodalLM tokenizer = AutoTokenizer.from_pretrained("DanVP/MoxhiMT-60") model = AutoModelForMultimodalLM.from_pretrained("DanVP/MoxhiMT-60") - Notebooks
- Google Colab
- Kaggle
MoxhiMT 60 zh-vi
Chinese β Vietnamese Marian-style machine translation model, tuned for xianxia / web-novel text.
Intended Use
- Chinese β Vietnamese web novel / fiction translation
- Local or server inference
- Experimental release; review output for high-stakes / publication use
Model Details
- Architecture: Marian seq2seq (8 encoder + 2 decoder layers)
- Parameters: ~57M (d_model 576, ffn 2304)
- Tokenizer: SentencePiece source/target, joint ZH+VI, vocab 24k
- Suggested decoding:
num_beams=4,max_length=512
Quick Start
from transformers import AutoTokenizer, AutoModelForSeq2SeqLM
model_id = "DanVP/MoxhiMT-60"
tok = AutoTokenizer.from_pretrained(model_id)
model = AutoModelForSeq2SeqLM.from_pretrained(model_id)
text = "δ»ζ¬ε€΄ηεθΏε€ηε±±ι¨γ"
inputs = tok(text, return_tensors="pt", truncation=True, max_length=512)
out = model.generate(**inputs, max_length=512, num_beams=4)
print(tok.decode(out[0], skip_special_tokens=True))
CTranslate2 (INT8)
A CTranslate2 INT8 build is included under ct2-int8/ for faster CPU inference.
import ctranslate2
from transformers import AutoTokenizer
tok = AutoTokenizer.from_pretrained("DanVP/MoxhiMT-60")
translator = ctranslate2.Translator("ct2-int8", device="cpu", compute_type="int8")
text = "δ»ζ¬ε€΄ηεθΏε€ηε±±ι¨γ"
src = tok.convert_ids_to_tokens(tok(text, truncation=True, max_length=512).input_ids)
results = translator.translate_batch([src], beam_size=4, max_decoding_length=512)
print(tok.decode(tok.convert_tokens_to_ids(results[0].hypotheses[0]), skip_special_tokens=True))
Notes
- Prioritizes translation quality on xianxia / cultivation terminology.
- Trained from scratch with a custom SentencePiece-BPE 24k joint ZH+VI tokenizer.
- Downloads last month
- -