How to stop the prediction once the model is generated a sufficient solution for the asked prompt ?

#49
by MukeshSharma - opened

knowing max_length is kept 300 , but answer is getting ended in 150 , so how to stop the model so that it dont give further prediction .
Any suggestion can help , since I aint sure whats the max length for different prompts , so setting it to a static , some time gives unwanted prediction after the actual prediction is already done.

use this:

import time
import torch
from transformers import pipeline

start = time.time()
'''loading the local checkpoint here, device_map = "auto" decide where to put each layer, 
either on the GPU or the CPU'''
pipe = pipeline("text-generation", model="/home/ec2-user/starCoderCheckpointLocal",
                torch_dtype=torch.bfloat16, device_map= "auto",load_in_8bit=True)

text = input("Enter query >>")

prompt_template = "<|system|>\n<|end|>\n<|user|>\n{query}<|end|>\n<|assistant|>"
prompt = prompt_template.format(query=text)
outputs = pipe(prompt, max_new_tokens=512, stop_sequence='<|end|>', do_sample=True,
               temperature=0.2, top_k=50, top_p= 0.95, eos_token_id= 49155)

# print(outputs)
# print( outputs[0]['generated_text'])
generated = outputs[0]['generated_text'].split('<|assistant|>')[-1]
print(generated)

end = seconds = time.time()
time = end - start
print("Time taken: ", str(int(time//60))+"minutes",str(round(time%60))+"seconds")

Hey @doraexp i got the value error.
ValueError: The following model_kwargs are not used by the model: ['stop_sequence'] (note: typos in the generate arguments will also show up in this list)
output = model.generate(
input_ids,
do_sample=True,
min_length=min_length,
max_length=max_length,
temperature=temperature,
early_stopping=True,
stop_sequence='<|end|>',
top_k=50,
top_p= 0.95,
eos_token_id= 49155,

    )

I am using stracoder model , any further suggestion to resolve this. Or any alternative ? do suggest
thanks

@doraexp can u please help on this ? really looking forward for ur help .

Hi @MukeshSharma ,

Could you please provide me with the code snippet that you using and the checkpoint that you are trying to load, and the whole error would be really helpful too. :))

I am loading the same checkpoint
from transformers import AutoTokenizer, AutoModelForCausalLM

tokenizer = AutoTokenizer.from_pretrained("bigcode/starcoder")

model = AutoModelForCausalLM.from_pretrained("bigcode/starcoder")

No etc changes
During output i am using
output = model.generate(
input_ids,
do_sample=True,
min_length=min_length,
max_length=max_length,
temperature=temperature,
early_stopping=True,
stop_sequence='<|end|>',
top_k=50,
top_p= 0.95,
eos_token_id= 49155,

)

So nothing etc. I am trying but still this error
i got the value error.
ValueError: The following model_kwargs are not used by the model: ['stop_sequence'] (note: typos in the generate arguments will also show up in this list)

Hey @doraexp plse review once

@doraexp Any help on this , I am still in use of it .
Thanks

Hi @MukeshSharma , sorry I got little busy with some-other stuff and couldn't reply before. Also, I am not sure why you are getting this error.

However, I am downloading the model on my local and then running it. Follow the below step and see if they works for u

Run this python pgrm:


tokenizer = AutoToklsenizer.from_pretrained("HuggingFaceH4/starchat-alpha")
model = AutoModelForCausalLM.from_pretrained("HuggingFaceH4/starchat-alpha")

#mention the directory where you want to save the checkpoint
tokenizer.save_pretrained("/home/ec2-user/starCoderCheckpointLocal")
model.save_pretrained("/home/ec2-user/starCoderCheckpointLocal")

#these commands check if the model is working offline using the local directory
tokenizer = AutoTokenizer.from_pretrained("/home/ec2-user/starCoderCheckpointLocal")
model = AutoModelForCausalLM.from_pretrained("/home/ec2-user/starCoderCheckpointLocal")

Now just run this:

from transformers import AutoModelForCausalLM, AutoTokenizer

import torch
#checkpoint = "HuggingFaceH4/starchat-alpha"
checkpoint= "/home/ec2-user/starCoderCheckpointLocal"

device = "cuda" *# for GPU usage or "cpu" for CPU usage*

tokenizer = AutoTokenizer.from_pretrained(checkpoint)

# to save memory consider using fp16 or bf16 by specifying torch_dtype=torch.float16 for example
model = AutoModelForCausalLM.from_pretrained(checkpoint,torch_dtype=torch.float16).to(device)

inputs = tokenizer.encode("Create a typescript function that calculates factorial of a number.", return_tensors="pt").to(device)
outputs = model.generate(inputs,max_length=500)
print(tokenizer.decode(outputs[0]))

I hope this helps :)

Sign up or log in to comment