Limit chat to exactly one response

by country-squire - opened May 15, 2023

May 15, 2023

Thanks for the great work!! I am running RedPajama-INCITE-Chat-3B-v1 on my local desktop and so far all works fine, but I wasn't able to find out how to limit the AI to one chat response.

When I ask for something that generates a long response, it's often truncated (so I'd like to increase max length). When I ask for something simple, like a birth date, it responds and then just continues with random dialogues, that are apparently "memories" from the training data...

Can I pass a parameter limiting the chat to one response? I could cut the extra text client side, but that's not the best solution and wastes GPU time.

Also, is there a context between a message and the previous ones? I asked about a person and received some good info, and the I asked "and when was she born?" and received the birthdate of someone else... or am I expecting too much? ;-)

Thanks in advance for your advise...

szeta

May 17, 2023

Hi, I added the following stopping criteria, this worked.

class StoppingCriteriaSub(StoppingCriteria):
    def __init__(self, stops=[], encounters=1):
        super().__init__()
        self.stops = [stop.to("cuda") for stop in stops]

    def __call__(self, input_ids: torch.LongTensor, scores: torch.FloatTensor):
        for stop in self.stops:
            if torch.all((stop == input_ids[0][-len(stop) :])).item():
                return True

        return False


stop_words = ["<human>:"]
stop_words_ids = [
    tokenizer(stop_word, return_tensors="pt")["input_ids"].squeeze()
    for stop_word in stop_words
]
stopping_criteria = StoppingCriteriaList([StoppingCriteriaSub(stops=stop_words_ids)])

then pass it into your model.generate() as parameter.
And if you want, remove the last <human>: (e.g., output_str = output_str.replace("<human>:", "")

szeta

May 17, 2023

for your second question:
you need to keep a histroy yourself (of human/bot interaction), and pass it into your prompt (append your new question to the old conversation).

Below an example implementation (based on FastAPI).

Be aware that the token size increases and you may want to cut off at one point (or have a secret loop to summarize behind the scenes :-)).

history = []



@router
	.post("/chat")
async def data(data: dict):
    response = {}
    try:
        input_text = data["text"]
        input_text_with_hist = "\n".join(history) + "\n<human>:" + input_text

        res = infer(input_text_with_hist)
        response["text"] = res
        history.append(f'\n<human>:{data["text"]}\n')
        history.append(f"\n<bot>: {res}\n")
        print(response)
        return response

country-squire

May 18, 2023

@szeta Thanks for the detailed replies! I'll try this out.

Shanmug

Jun 26, 2023

Hi @szeta , can we create a customized AI chatbot with RedPajama-INCITE-Chat-3B-v1 model by training with our own data? If so can you let me know how to train our own data in the chatbot?

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment