Text Generation
Transformers
PyTorch
English
gpt_neox
text-generation-inference
Inference Endpoints

Limit chat to exactly one response

#6
by country-squire - opened

Thanks for the great work!! I am running RedPajama-INCITE-Chat-3B-v1 on my local desktop and so far all works fine, but I wasn't able to find out how to limit the AI to one chat response.

When I ask for something that generates a long response, it's often truncated (so I'd like to increase max length). When I ask for something simple, like a birth date, it responds and then just continues with random dialogues, that are apparently "memories" from the training data...

Can I pass a parameter limiting the chat to one response? I could cut the extra text client side, but that's not the best solution and wastes GPU time.

Also, is there a context between a message and the previous ones? I asked about a person and received some good info, and the I asked "and when was she born?" and received the birthdate of someone else... or am I expecting too much? ;-)

Thanks in advance for your advise...

Hi, I added the following stopping criteria, this worked.

class StoppingCriteriaSub(StoppingCriteria):
    def __init__(self, stops=[], encounters=1):
        super().__init__()
        self.stops = [stop.to("cuda") for stop in stops]

    def __call__(self, input_ids: torch.LongTensor, scores: torch.FloatTensor):
        for stop in self.stops:
            if torch.all((stop == input_ids[0][-len(stop) :])).item():
                return True

        return False


stop_words = ["<human>:"]
stop_words_ids = [
    tokenizer(stop_word, return_tensors="pt")["input_ids"].squeeze()
    for stop_word in stop_words
]
stopping_criteria = StoppingCriteriaList([StoppingCriteriaSub(stops=stop_words_ids)])

then pass it into your model.generate() as parameter.
And if you want, remove the last <human>: (e.g., output_str = output_str.replace("<human>:", "")

for your second question:
you need to keep a histroy yourself (of human/bot interaction), and pass it into your prompt (append your new question to the old conversation).

Below an example implementation (based on FastAPI).

Be aware that the token size increases and you may want to cut off at one point (or have a secret loop to summarize behind the scenes :-)).

history = []



@router
	.post("/chat")
async def data(data: dict):
    response = {}
    try:
        input_text = data["text"]
        input_text_with_hist = "\n".join(history) + "\n<human>:" + input_text

        res = infer(input_text_with_hist)
        response["text"] = res
        history.append(f'\n<human>:{data["text"]}\n')
        history.append(f"\n<bot>: {res}\n")
        print(response)
        return response

@szeta Thanks for the detailed replies! I'll try this out.

Hi @szeta , can we create a customized AI chatbot with RedPajama-INCITE-Chat-3B-v1 model by training with our own data? If so can you let me know how to train our own data in the chatbot?

Sign up or log in to comment