Anyone knows how to translate longer text? - I am new on this

#18

by JaimeLugo - opened Dec 8, 2023

Dec 8, 2023

I have the code below and i am only interested in T2T format. I am new so very likely i have a newbi mistake but i am not able to see the tranlated text if its longer than 500 characters.... i only see the first 400 char, anyone knows how to solve this?

thanks!

def translate_text(text, src_lang, tgt_lang):
processor = AutoProcessor.from_pretrained("facebook/seamless-m4t-v2-large")
model = SeamlessM4Tv2Model.from_pretrained("facebook/seamless-m4t-v2-large")

text_inputs = processor(text = text, src_lang=src_lang, return_tensors="pt")
output_tokens = model.generate(**text_inputs, tgt_lang=tgt_lang, text_num_beams=5, generate_speech=False)
translated_text = processor.decode(output_tokens[0].tolist()[0], skip_special_tokens=True)

return translated_text

noobmaster29

Jan 14, 2024

It seems the default max_new_tokens is set to 256 for this model. You can probably increase this but be mindful of your input token length and the context length of the model (which I believe is 4096).

max_new_tokens (int, optional, defaults to 256) — The maximum numbers of text tokens to generate, ignoring the number of tokens in the prompt.

If you need to translate even longer text, probably best to chunk it at like a period after its exceeded some length and loop over your entire text.

JaimeLugo

Jan 14, 2024

Thanks Noobmaster29! - I managed to increase the answer length simply with "max_new_tokens=1000".... this works very well, however, the model in general likes to cut sentences... perhaps if it feels the sentence has redundant words it simply ignores it.

noobmaster29

Jan 14, 2024

Hmmm, what language are you using? I'm finding some of the translation from English to Chinese somewhat questionable. Seems like the model does not like long sentences or really short phrases.

junelegend

Jan 30, 2024

can any body let me know where do i have to add the max_new_tokens parameter, i am not able to figure out.

pranavabani023

Jan 30, 2024

Please can anyone help me
Where should I start so that I can also use models available in hugging face

kk53

Aug 4, 2024

@junelegend I changed the max_new_tokens by model.config.max_new_tokens = 4096 ,
can confirm its changed when you print model.config again
But still doesnt change output audio length for S2SS . Not sure which other config param needs to be changed.
Could someone help please?

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment