m2m100-12B-last-ckpt not responding to "forced_bos_token_id" parameter to specify language

#3
by Jadin-Jackson-BSCI - opened

I'm deploying the m2m100-12B-last-ckpt on SageMaker as an endpoint. It is not responding to "forced_bos_token_id" parameter to specify language. Changing the "forced_bos_token_id" parameter for different languages (e.g. 128067 for 'nl', 128020 for 'de', etc.) is has not change on the language returned by the model, which seems to randomly select a target language depending on the input text.

For example:
source_txt = """Include your full and complete name. """

llm.predict({
"inputs": source_txt, 'parameters': {
'forced_bos_token_id': tokenizer.get_lang_id("de")}})

[{'generated_text': 'Veuillez inclure votre nom complet et complet.'}]

I don't see this behavior for the m2m100_1.2B or m2m100_418M versions. Any reason why this might be the case? Or, any suggested fixes to try?

Sign up or log in to comment