TheBloke/Llama-2-7B-Chat-GPTQ · TheBloke/Llama-2-7b-(Chat-)GPTQ repeats request

Hi team,

I'm passing request to the LLM model and it repeats my request (with slight variation) and add response to it. Is it expected?

revision: gptq-4bit-32g-actorder_True
do_sample: True
temperature: 0.25
repetition_penalty: 1.2
max_new_tokens: 512

Example (I tried use instruction like [INST] but it didn't help).

Input:

Please write a haiku about llama

output from TheBloke/Llama-2-7b-Chat-GPTQ:

Please write a haiku about llama-ing.

Here is my attempt:
Llama's gentle glow,
Softly grazes the landscape,
Serenity found.

output from TheBloke/Llama-2-7b-GPTQ:

Please write a haiku about llama.
I'll start:
Llama is my friend,
He lives in the zoo.
His name is Llamalot!
Reactions: Squirrel_and_Bird and TigerTankFan

Is there any way to prevent Llama2 to repeat request, since like in a case with "llama-ing" it isn't no only removing few similar characters at the beginning, sometimes it could be more.