Edit model card

Overview

Scoris logo This is an Lithuanian-English translation model (Seq2Seq). For English-Lithuanian translation check another model scoris/scoris-mt-en-lt

Original model: Helsinki-NLP/opus-mt-tc-big-lt-en

Fine-tuned on large merged data set: scoris/en-lt-merged-data (5.4 million sentence pairs)

Trained on 3 epochs.

Made by Scoris team

Evaluation:

LT-EN BLEU
scoris/scoris-mt-lt-en 43.8
Helsinki-NLP/opus-mt-tc-big-en-lt 36.8
Google Translate 31.9
Deepl 36.1

Evaluated on scoris/en-lt-merged-data validation set. Google and Deepl evaluated using a random sample of 1000 sentence pairs.

According to Google BLEU score interpretation is following:

BLEU Score Interpretation
< 10 Almost useless
10 - 19 Hard to get the gist
20 - 29 The gist is clear, but has significant grammatical errors
30 - 40 Understandable to good translations
40 - 50 High quality translations
50 - 60 Very high quality, adequate, and fluent translations
> 60 Quality often better than human

Usage

You can use the model in the following way:

from transformers import MarianMTModel, MarianTokenizer

# Specify the model identifier on Hugging Face Model Hub
model_name = "scoris/scoris/scoris-mt-lt-en"

# Load the model and tokenizer from Hugging Face
tokenizer = MarianTokenizer.from_pretrained(model_name)
model = MarianMTModel.from_pretrained(model_name)

src_text = [
    "Kartą, senų senovėje, buvo viena mergaitė ir gyveno ji su savo mama mažoje jaukioje trobelėje prie miško. ",
    "Mergaitę žmonės vadino Raudonkepuraite, nes ji dažnai dėvėdavo raudoną apsiaustėlį su kapišonu. ",
    "Mergaitė mielai gobdavosi šiuo apsiaustėliu, nes jį buvo gavusi iš savo močiutės, kuri gyveno namelyje už miško ir labai mylėjo Raudonkepuraitę. ",
    "Vieną dieną mama priruošė Raudonkepuraitei pilną krepšelį įvairiausių gėrybių.",
    "Pridėjo obuoliukų, kriaušaičių, braškių, taip pat skanių pyragėlių, kuriuos pati buvo iškepusi, sūrio ir gabalėlį mėsos bei didelį išdabintą tortą."
]

# Tokenize the text and generate translations
translated = model.generate(**tokenizer(src_text, return_tensors="pt", padding=True))

# Print out the translations
for t in translated:
    print(tokenizer.decode(t, skip_special_tokens=True))

#Once upon a time there was a girl, and she lived with her mother in a small cozy hut by the forest.
#The girl was called the Red cape because she often wore a red cape.
#The girl would gladly wear this coat, because she had it from her grandmother, who lived in a house outside the forest and loved Redcape very much.
#One day my mother prepared a basket full of all kinds of good things for the Red cape.
#He added apples, pears, strawberries, as well as delicious cakes that he had baked, cheese and a piece of meat, and a large cake.
Downloads last month
2
Safetensors
Model size
236M params
Tensor type
F32
·

Dataset used to train scoris/scoris-mt-lt-en