Do we need BOS token before each turn of chat during finetuning?

#9
by Annorita - opened

Many thanks for the models. I'm confusing on how to prepare multi-turn chat data for finetuning.

On your README, there is an example for chat model usage:

You are an AI programming assistant, utilizing the DeepSeek Coder model, developed by DeepSeek Company, and you only answer questions related to computer science. For politically sensitive questions, security and privacy issues, and other non-computer science questions, you will refuse to answer.
### Instruction:
['content']
### Response:
['content']
<|EOT|>
### Instruction:
['content']
### Response:

ref: https://github.com/deepseek-ai/deepseek-coder#3-chat-model-inference
Here there is no BOS token at the front of the second turn.

However in Llama's template, BOS token is added to the front of each turn:

            "{% if message['role'] == 'user' %}"  
            "{{ bos_token + '[INST] ' + content.strip() + ' [/INST]' }}"

ref: https://github.com/huggingface/transformers/blob/e6dcf8abd6f65bb4b6dfc1831b20d9ba49ce00e2/src/transformers/models/llama/tokenization_llama.py#L460C42-L460C42

If I want to finetune deepseek-ai/deepseek-coder-6.7b-instruct model, do I need to insert EOS token before every turn of the dialog just like Llama did?
Or I can just follow the inference example on your README? (no need for EOS before every turn)

I've checked the finetuning scripts on the repo, but it only provide single-turn example.
ref: https://github.com/deepseek-ai/DeepSeek-Coder/blob/791c8e2c2c5f89032041010efa60776eb4306d58/finetune/finetune_deepseekcoder.py#L16

DeepSeek org

just follow the inference example in the README.

I see. Thanks for the feedback.

Annorita changed discussion status to closed

Sign up or log in to comment