[QUESTION] model translates only a part of the text

by hkad98 - opened Aug 13, 2022

Aug 13, 2022

Hello Community,
I was exploring NLLB, but unfortunately, I encountered some issues during translations. The description says that the maximum input lengths should not exceed 512 tokens because they did not train the model on the longer sequences. So I tried to translate a text, but unfortunately, the model translated only a part of the text even though the tokenized text does not exceed 512 tokens. The text did not exceed 512 tokens and splitting the text into sentences worked. So I am wondering if anyone encountered something similar if yes, how did you solve it?

from transformers import AutoTokenizer, AutoModelForSeq2SeqLM, pipeline

model = AutoModelForSeq2SeqLM.from_pretrained("facebook/nllb-200-distilled-600M")
tokenizer = AutoTokenizer.from_pretrained("facebook/nllb-200-distilled-600M")

translator = pipeline('translation', model=model, tokenizer=tokenizer, src_lang="ces_Latn", tgt_lang='eng_Latn')
translator("Zuzka bydlí v paneláku na 9. podlaží. Anička bydlí o 3 podlaží výše. Na kterém podlaží bydlí Anička?")
# [{'translation_text': 'Zuzka lives in a boarding house on the ninth floor, Anička lives three floors up.'}]

translator(["Zuzka bydlí v paneláku na 9. podlaží. Anička bydlí o 3 podlaží výše.", "Na kterém podlaží bydlí Anička?"])
# [{'translation_text': 'Zuzka lives in a boarding house on the ninth floor, and Anička lives three floors up.'}, {'translation_text': 'What floor does Anicka live on?'}]

lysandre

Aug 16, 2022

Hey @hkad98 !

This isn't a bug, but the generate method only outputs a specific number of tokens. You can specify the min_length and max_length parameters to get more or fewer tokens out.

Here, for example, by specifying I want a minimum number of tokens returned of 30:

>>> translator("Zuzka bydlí v paneláku na 9. podlaží. Anička bydlí o 3 podlaží výše. Na kterém podlaží bydlí Anička?", min_length=30)
Out[8]: [{'translation_text': 'Zuzka lives in a boarding house on the ninth floor, Anička lives three floors up. What floor does Anička live on?'}]

hkad98

Aug 16, 2022

Hi @lysandre

Thank you for your clarification! Would you please point me to documentation for other parameters that can be passed to the generate method (I only found this)? I am new to hugging face transformers.

hkad98 changed discussion title from [QUESTION, BUG?] model translates only a part of the text to [QUESTION] model translates only a part of the text Aug 16, 2022

lysandre

Aug 16, 2022

Yes! Here is the documentation specific to text-generation: Text Generation.

You will be interested in the generate method in particular.

We're currently reworking this page, so any feedback is welcome! cc @patrickvonplaten

hkad98

Aug 20, 2022

@lysandre Wow! Thank you so much. It is really useful.

hkad98 changed discussion status to closed Aug 20, 2022

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment