translated text is not complete

#21
by Nathan-Geo - opened

Hello,

I met a issue that the long text will be truncated when running TT2T task , the code is shown below. I tried to set max_new_tokens or max_length to 4096, both did not work.

This text is extracted from BBC news

text = """
Britain's most famous steam locomotive turned 100 this year, and along the way it has inspired poets, Hitchcock, Harry Potter and Royalty. What makes it such an enduring icon?

Some grow misty-eyed with nostalgia at the mere mention of them, waiting for hours on a windy platform just to get a glimpse or a photo of these stars of a bygone age.

Others find them smelly, dirty, and their hooting and screeching too much to bear.

We're talking about steam engines, and although travelling by locomotive may be a rare treat for most, the golden age of steam is being kept alive at the many heritage railways around the world, with more than 30 still running in the UK alone.
"""

process input

text_inputs = processor(text =text, src_lang="eng", return_tensors="pt").to(device)

gen_kwargs = {
# "max_length": 4096
"max_new_tokens": 4096
}
output_tokens = model.generate(**text_inputs, **gen_kwargs, tgt_lang="cmn", generate_speech=False)
translated_text_from_text = processor.decode(output_tokens[0].tolist()[0], skip_special_tokens=True)
print(f"Translation from text: {translated_text_from_text}")

output_tokens = model.generate(**text_inputs, **gen_kwargs, tgt_lang="cmn", generate_speech=False)
translated_text_from_text = processor.decode(output_tokens[0].tolist()[0], skip_special_tokens=True)
print(f"Translation from text: {translated_text_from_text}")

Translation from text: 英国最著名的蒸汽机车今年年满100岁,并激发了诗人,希奇科克,哈利波特和皇家的灵感它是什么让它成为如此持久的象征?有些人仅仅在提及它们时就感到怀念,在风的平台上等了几个小时,只是为了瞥见或拍摄这些过去的明星的照片.

Sign up or log in to comment