en-tr translation missing sentences, sometimes cut off

#7
by MelihDumm - opened

When I want to translate an english text to turkish, sometimes the text is cut off in the middle of the sentence, or last sentences are missing altogether. Sometimes it translates only the first sentence and don't bother with the rest. Approximately 5% of the translations are like this. Is there anything I am missing? Even when I copy/paste the sample code, the error is there (or using the space in the page)

Here is an example english text: (I know this is a fancy example but there are 'normal' language cases too)
"Ahoy there, me hearties! Ye be askin' about rainbows, I presume? Well, let me tell ye a thing or two 'bout them. Rainbows are caused by the refraction and dispersion of light in water droplets, usually in the atmosphere during or after rainfall. The way it works is that when the sun shines on raindrops, the light is split into its component colors - red, orange, yellow, green, blue, and violet - which then spread out and form a rainbow arch in the sky. Now ye might be wondering why rainbows always appear with their back to the sun. That's because our eyes perceive the light reflected by the raindrops as originating from behind us, even though it's actually coming from above. So there ya have it, me buckos! Rainbows ain't nothin' but a bunch of pretty colors in the sky, but they can sure brighten up a dreary day at sea. Happy sailin', and may yer next rainbow be as colorful as the parrots on me shoulder!"

when I translate it, it is cut off from " yellow, green, blue, and violet - ". After these words are missing.
"Ahoy orada, kalplerim! Gökkuşağı hakkında soru soruyorsunuz, tahmin ediyorum? Şey, size bir iki şey söyleyeyim 'onlarla ilgili. Gökkuşağının nedeni, suyun damlacıklarında ışığın kırılması ve dağılmasıdır, genellikle yağmur sırasında veya sonrasında atmosferde olur. Bu şekilde, güneş yağmur damlalarında parladığında, ışık neden bileşen renklerine bölünür - kırmızı, turuncu, sarı, yeşil, mavi ve mor -"

The code is:

model_nameEn2Tr = curdir + "/models/opus-mt-tc-big-en-tr"
tokenizerEn2Tr = MarianTokenizer.from_pretrained(model_nameEn2Tr)
modelEn2Tr = MarianMTModel.from_pretrained(model_nameEn2Tr)

def translateEn2Tr(src_text):
translated = modelEn2Tr.generate(**tokenizerEn2Tr(src_text, return_tensors="pt", padding=True))
translated_text = "" # Initialize an empty string
for t in translated:
translated_text += tokenizerEn2Tr.decode(t, skip_special_tokens=True)
return translated_text

Here is another example:
English:
"Lorem ipsum is a type of placeholder text that is commonly used in graphic design, web development, and typesetting. It is essentially a series of nonsensical Latin words and phrases that have been used since the 16th century to fill up space where real content would eventually go. The meaning of lorem ipsum is not important because it is simply a filler text that does not convey any actual meaning, but rather serves as a visual guide for designers and developers to ensure that the layout, spacing, and typography are correct before the real content is added. In short, lorem ipsum is a widely-used tool in design and development that allows creators to create convincing mockups without having to worry about meaningful content until later stages of the project."

Turkish translation (Cut off from : " In short, lorem ipsum is a widely-used tool in design and development that allows creators to create convincing mockups without having to worry about meaningful content until later stages of the project." And also the previous sentence is partially translated.
"Lorem ipsum, grafik tasarım, web geliştirme ve dizgide yaygın olarak kullanılan bir yer tutucu metin türüdür. Esasen 16. yüzyıldan beri kullanılan anlamsız Latince kelimeler ve deyimler dizisidir. Gerçek içeriğin sonunda gideceği alanı doldurmak için kullanılır. Lorem ipsum'un anlamı önemli değildir, çünkü sadece gerçek bir anlam ifade etmeyen bir dolgu metnidir, daha ziyade tasarımcılar ve geliştiriciler için görsel bir aralık görevi görür."

Sign up or log in to comment