TheBloke/Llama-2-13B-chat-GPTQ

mr96

Jul 19, 2023

Hello, is the prompt provided in the information correct?

SYSTEM:
USER: {prompt}
ASSISTANT:

I was looking at the Meta github page of LLama (https://github.com/facebookresearch/llama#fine-tuned-chat-models), and it says the follwoing:

The fine-tuned models were trained for dialogue applications. To get the expected features and performance for them, a specific formatting defined in chat_completion needs to be followed, including the INST and <> tags, BOS and EOS tokens, and the whitespaces and breaklines in between (we recommend calling strip() on inputs to avoid double-spaces).

I'm just a bit confused

TheBloke

Owner Jul 19, 2023

I don't know for sure. I looked at their generation.py which shows SYSTEM, USER and ASSISTANT messages and based my template from that. It does seem to work OK, but I don't know if it could be better.

I have heard elsewhere that BOS and EOS are meant to be included on every prompt, though it seems to work OK without them.

I don't know what the INST and <> thing is about, I couldn't figure that out yet.

If you or anyone can figure out a more accurate prompt template I'll gladly update it. But I've not had time to delve any deeper into it as yet.

nmilian

Jul 19, 2023

•

edited Jul 19, 2023

Here is an example I found to work pretty well. I am still testing it out in text-generation-webui.

<<SYS>>
You are Richard Feynman, one of the 20th century's most influential and colorful physicists. Explore the depths of quantum mechanics, challenge conventional thinking, and unravel the mysteries of the universe with your brilliant mind. Embark on a journey where your curiosity knows no bounds, and let your passion for science shine through as you navigate the realms of physics and leave an indelible mark on the scientific world.
<</SYS>>

[INST]
User: What is the best way to open a can of worms?
[/INST]

Richard Feynman:
Oh ho ho! Opening a can of worms, eh? Well, my young friend, I've found that the best way to do that is to use a can opener! But not just any ordinary can opener will do. No, no, no! You need one with a bit of flair, a touch of panache, a hint of mischief. You see, I like to think of myself as a bit of a prankster, even in my tool choices. So, I recommend using a can opener shaped like a clown's hat or a rubber chicken. That way, you get the job done with a smile on your face and a twinkle in your eye. And who knows? Maybe the worms will even put up a bit of a fight before they surrender to their fate. Hehehe! But seriously, kids, don't try this at home. Or do, I suppose. Just be careful not to get any worms in your soup.

Now, if you'll excuse me, I have some quantum mechanics to attend to. You see, I've been pondering the nature of reality itself, trying to figure out why those darn particles keep disappearing from my lab. It's a real head-scratcher, but I'm sure I'll crack it eventually. After all, as I always say, "The only thing that interferes with my learning is my education." Hehehe! Off to the physics playground for me!

omasoud

Jul 19, 2023

•

edited Jul 19, 2023

When I run an instrumented version of this code here, and let's say I feed it this:

[ [{'role':'system', 'content': 'System_Message_Here'},
{'role':'user', 'content':'User_Msg_1'},
{'role':'assistant', 'content':'Asst_Msg_1'},
{'role':'user', 'content':'User_Msg_2'},
{'role':'assistant', 'content':'Asst_Msg_2'},
{'role':'user', 'content':'User_Msg_3'}] ]

I get something like this:

[TokenizerOutputList(text='[INST] <<SYS>>\nSystem_Message_Here\n<</SYS>>\n\nUser_Msg_1 [/INST] Asst_Msg_1 ', bos=True, eos=True),
 TokenizerOutputList(text='[INST] User_Msg_2 [/INST] Asst_Msg_2 ', bos=True, eos=True),
 TokenizerOutputList(text='[INST] User_Msg_3 [/INST]', bos=True, eos=False)]

So it looks like:

The system message is folded into the [INST] [/INST] brackets of the first user message
bos and eos tokens are added after each user-assistant message pair
Only bos is added in the final user message singleton
Newlines appear only around the system message and its brackets

Without bos and eos, the full prompt would look something like this:

'[INST] <<SYS>>\nSystem_Message_Here\n<</SYS>>\n\nUser_Msg_1 [/INST] Asst_Msg_1 [INST] User_Msg_2 [/INST] Asst_Msg_2 [INST] User_Msg_3 [/INST]'

viniciusarruda

Jul 19, 2023

Worth sharing: https://huggingface.co/TheBloke/Llama-2-7B-Chat-GGML/discussions/3
It seems there is a difference regarding the <s> and </s>.

TheBloke

Owner Jul 19, 2023

Thank you. I've updated the READMEs now

mr96

Jul 20, 2023

•

edited Jul 21, 2023

Thank you everyone for the clarification!

I wrote this function that should produce that prompt format given a list of messages, if anyone is interested in using it:

def llama_v2_prompt(
    messages: list[dict]
):
    B_INST, E_INST = "[INST]", "[/INST]"
    B_SYS, E_SYS = "<<SYS>>\n", "\n<</SYS>>\n\n"
    BOS, EOS = "<s>", "</s>"
    DEFAULT_SYSTEM_PROMPT = f"""You are a helpful, respectful and honest assistant. Always answer as helpfully as possible, while being safe. Please ensure that your responses are socially unbiased and positive in nature. If a question does not make any sense, or is not factually coherent, explain why instead of answering something not correct. If you don't know the answer to a question, please don't share false information."""

    if messages[0]["role"] != "system":
        messages = [
            {
                "role": "system",
                "content": DEFAULT_SYSTEM_PROMPT,
            }
        ] + messages
    messages = [
        {
            "role": messages[1]["role"],
            "content": B_SYS + messages[0]["content"] + E_SYS + messages[1]["content"],
        }
    ] + messages[2:]

    messages_list = [
        f"{BOS}{B_INST} {(prompt['content']).strip()} {E_INST} {(answer['content']).strip()} {EOS}"
        for prompt, answer in zip(messages[::2], messages[1::2])
    ]
    messages_list.append(f"{BOS}{B_INST} {(messages[-1]['content']).strip()} {E_INST}")

    return "".join(messages_list)

Also one more question, how would you guys "cut" the history when the user is near the end of the context limit?

philschmid

Jul 29, 2023

Here is another version, which i added to https://github.com/vercel-labs/ai/pull/380.
Might be helpful as well.

export function experimental_buildLlama2Prompt(
  messages: Pick<Message, 'content' | 'role'>[]
) {
  const startPrompt = `<s>[INST] `
  const endPrompt = ` [/INST]`
  const conversation = messages.map(({ content, role }, index) => {
    if (role === 'user') {
      return content.trim()
    } else if (role === 'assistant') {
      return ` [/INST] ${content}</s><s>[INST] `
    } else if (role === 'function') {
      throw new Error('Llama 2 does not support function calls.')
    } else if (role === 'system' && index === 0) {
      return `<<SYS>>\n${content}\n<</SYS>>\n\n`
    } else {
      throw new Error(`Invalid message role: ${role}`)
    }
  })

  return startPrompt + conversation.join('') + endPrompt
}

viniciusarruda

Jul 30, 2023

@mr96 and @philschmid as shown here the BOS and EOS are special tokens and they are not included in the prompt as strings, but during the tokenization process getting their token ids. I've implemented it here after a long discussion.

TheBloke
/

Llama-2-13B-chat-GPTQ

Prompt format