Transformer Pipeline

#26
by francescoyoubiquo - opened

Loading Gemma 2b.it model with this code:

model_version = 2
model_id = f"/kaggle/input/gemma/transformers/2b-it/{model_version}"
model_config = f"/kaggle/input/gemma/transformers/2b-it/{model_version}/config.json"

tokenizer_id = f"/kaggle/input/gemma/transformers/2b-it/{model_version}"
tokenizer_config = f"/kaggle/input/gemma/transformers/2b-it/{model_version}/tokenizer_config.json"

model_config = AutoConfig.from_pretrained(model_config)
model = AutoModelForCausalLM.from_pretrained(model_id, config=model_config, device_map='auto')

tokenizer_config = AutoConfig.from_pretrained(tokenizer_config)
tokenizer = AutoTokenizer.from_pretrained(tokenizer_id, config=tokenizer_config, device_map='auto', return_tensors="pt")

Executing the generation as follow:

input_text = "Write a python function to print all elements of a list."
input_ids = tokenizer(input_text, return_tensors="pt")
outputs = model.generate(**input_ids, max_new_tokens=16)
print(tokenizer.decode(outputs[0]))

Some text is generated. But creating a transformers.pipeline as follow, the only text in output is the input text.

query_pipeline = transformers.pipeline(
task="text-generation",
model=model,
tokenizer=tokenizer,
max_new_tokens=512,
device_map="auto",
framework="pt",
)

input_text = "Write a python function to print all elements of a list."
result = pipeline(
input_text ,
max_new_tokens=64,
do_sample=True,
num_return_sequences=1,
)

print(f"Result: {result}")

This is the output:
Result: [{'generated_text': 'Write a python function to print all elements of a list.'}]

This procedure is correct or there are some mistakes?

Instead, when the pipeline is applying the chat-template in this way before executing the pipeline generates some text:

chat = [
{ "role": "user", "content": input_text },
]
prompt = tokenizer.apply_chat_template(chat, tokenize=False, add_generation_prompt=True)

But the text is also generated creatina a pipeline of type "conversational" and passing a chat like this:
chat = [
{ "role": "user", "content": input_text },
]

There's a problem with the TextGenerationPipeline?

Even I am struggling with this

Google org

Is this using the right chat template and control tokens under the hood?

I had the same issue the generated_text is the same as input. I found a way to fix this.

Modify the code:

result = pipeline(
input_text ,
max_new_tokens=64,
do_sample=True,
num_return_sequences=1,
)

to:

result = pipeline(
input_text ,
max_new_tokens=64,
do_sample=True,
num_return_sequences=1,
add_special_tokens=True
)

Google org

ah good to know! cc @osanseviero in case we should specify this somewhere?

To utilize pipeline the chat template must be used. Using pipeline without chat template does not generate any new tokens.

Google org

Interesting, cc @ArthurZ @Rocketknight1 do you think there is something we need to upstream in transformers pipeline?

But shouldn't the text-generation pipeline produce new tokens as for all the other models?
Also for gemma-7b-it it sometimes generates tokens for me.

Sign up or log in to comment