Every Inference gives the prompt as part of output also, any way to remove that?

#26
by vermanic - opened

Hey, i have this general problem of any model on HF outputting with the input prompt always, any way to exclude that?

as I just need the output.

Code:

from transformers import AutoModelForCausalLM, AutoTokenizer
import torch

checkpoint = "HuggingFaceH4/starchat-beta"
device = "cuda" if torch.cuda.is_available() else "cpu"  # "cuda:X" for GPU usage or "cpu" for CPU usage


class Model:
    def __init__(self):
        print("Running in " + device)
        self.tokenizer = AutoTokenizer.from_pretrained(checkpoint)
        self.model = AutoModelForCausalLM.from_pretrained(checkpoint, device_map='auto')

    def infer(self, input_text, token_count):
        inputs = self.tokenizer.encode(input_text, return_tensors="pt").to(device)
        outputs = self.model.generate(inputs, max_new_tokens=token_count)
        return self.tokenizer.decode(outputs[0])

Also, max_new_tokens means the amount of tokens with which I want the model to respond with, right?

vermanic changed discussion title from Every Inference gives the prompt as part of output also, any way to fix this? to Every Inference gives the prompt as part of output also, any way to remove that?

Resolved by:

return self.tokenizer.decode(outputs[0])[len(input_text):]
vermanic changed discussion status to closed

Sign up or log in to comment