Edit model card

Model Card for Model ID

Model Details

  • Base Model: Pre-training
  • Model Description: This model can be used for translation.
  • Developed by: Platform Develop Div. at the 2Bytescorp Korea.
  • Model Type: Translation
  • Language(s):
    • Source Language: English
    • Target Language: Korean

Training Info

  • Training Step/epoch: 400,000 steps

Dataset

  • Train Dataset: 12,000,000
  • Test Dataset: 1,000,000
  • Valid Dataset: 1,000,000

Training Data

  • dataset: Our own Korea/English dataset.

How to Get Started With the Model (Inference)

import ctranslate2
import pyonmttok
import sys


if len(sys.argv) < 2:
    sentence = "I sincerely apologize for not providing the best taste and quality."
else:
    sentence = sys.argv[1]


tokenizer = pyonmttok.Tokenizer("conservative", joiner_annotate=True)
tokens = tokenizer(sentence)

model = "/home/techops/data/nmt_data/clean_data_files_v1/ctranslate2/model_4m"
# model = "/home/techops/data/nmt_data/ctranslate_model/en_ko/100m_300000"
translator = ctranslate2.Translator(model_path=model, device="cpu")
outputs = translator.translate_batch([tokens], beam_size=5, num_hypotheses=2, sampling_temperature=0.8, replace_unknowns=True)

translated = outputs[0].hypotheses[0]
t_s = tokenizer.detokenize(translated)

print(t_s.replace("@@", ""))

>>> 

(nmt) [techops@inf-ai-nmt-a01 (screen: ) /data/NMT/2b_nmt/ctranslate]$ python ctran_translate.py
์ตœ๊ณ ์˜ ๋ง›๊ณผ ํ’ˆ์งˆ์„ ์ œ๊ณตํ•˜์ง€ ๋ชปํ•œ ์ ์— ๋Œ€ํ•ด ์ง„์‹ฌ์œผ๋กœ ์‚ฌ๊ณผ๋“œ๋ฆฝ๋‹ˆ๋‹ค.
Downloads last month
3
Inference Examples
Inference API (serverless) does not yet support transformers(OpenNMT) models for this pipeline type.