Does two stage training use same hyperparamers?
In model card. There is a description:
First, we apply the foundational dataset Infinity-Instruct-3M to improve the foundational ability (math & code) of Qwen2-7B, and get the foundational instruct model Infinity-Instruct-3M-Qwen2-7B. Then we finetune the Infinity-Instruct-3M-Qwen2-7B to get the stronger chat model Infinity-Instruct-3M-0625-Qwen2-7B. Here is the training hyperparamers.
Question: there are two stages and only one group training hyperparamers. so does both two stage SFT training use same hyperparamers?
Yes, you can use the same set of hyperparameters for the two stages training.
Hello, which template do you use to fine tune from pretrained model to Foundational Instruct model? I assume you used the chat template to finetune the final chat model but what about the intermediate stage. Also use chat template with system prompt?