Question about model.

#1
by DopeorNope - opened

안녕하세요 좋은 모델 감사합니다.

다만, 모델 설명 읽어보니까, pretraining이 아니라 full finetuning된 것 같은데, pretraining으로 올라온것 같아서 질문 남겨봅니다.

감사합니다.

This comment has been hidden
Yanolja org

The question in English is:

"Hello, thank you for the good model.

However, reading the model description, it seems like it's fully fine-tuned rather than pre-trained, so I'm leaving a question since it seems to be uploaded as pre-training.

Thank you."

Answering the question:

Thank you for reaching out and for your kind words about the model. It's important to clarify the difference between a pre-trained model and a fine-tuned model. A pre-trained model is trained on a large, general dataset to learn a wide range of features that can be useful across many tasks. This model has indeed been pre-trained in such a manner. Fine-tuning, on the other hand, is when you take a pre-trained model and continue the training on a more specific dataset or task to adapt the model to particular requirements.

For this particular model, the core layers, except for the embed_tokens and lm_head, remain as they were during the initial pre-training, meaning most of the model's parameters are from the original pre-training without further fine-tuning. This approach ensures that the foundational knowledge from the pre-training is retained.

Regarding the new tokens, they were introduced and trained during the pre-training phase, not fine-tuned from existing parameters. The embeddings for these new tokens were learned from scratch, enriching the model's vocabulary and capabilities. As such, the resulting model is essentially a pre-trained model with expanded linguistic understanding, accommodating both the original and newly introduced tokens.

I hope this clears up any confusion regarding the model's training status. The model is primarily pre-trained with specific enhancements to include the new tokens. If you have further questions or need more information, please feel free to ask. Your feedback is invaluable in helping us improve our communication and model descriptions. Thank you!

@seungduk I've definitely understood it now.

So, as I understand it, it sounds like the training was done using the lamma2/full fine-tuning config yaml of the axolotl library. Would that be correct?

Additionally, I'm curious about what learning rate and number of epochs you used for training.

Thank you..!

Yanolja org

@seungduk I've definitely understood it now.

So, as I understand it, it sounds like the training was done using the lamma2/full fine-tuning config yaml of the axolotl library. Would that be correct?

Additionally, I'm curious about what learning rate and number of epochs you used for training.

Thank you..!

Yes, you're correct. Since the SOLAR model is built upon the LLaMA architecture, we used the same. Indeed, we employed Axolotl for the training process, albeit with specific modifications tailored to our needs, including freezing the embeddings for the existing tokens.

Interestingly, while unfreezing the embeddings for the existing tokens did improve the evaluation scores, our practical tests indicated that maintaining them frozen, as done in the KoSOLAR approach, yielded better real-world performance. This discrepancy has led me to be cautious about relying solely on evaluation scores.

For detailed information regarding the learning rate and the number of training steps, I would recommend referring to the model card, where these specifics are documented. Thank you.

@seungduk Ha ha ha, it seems that by unfreezing the layer, you have expanded the vocabulary and learned very well about the new tokens. Thank you for providing a good approach and insights.
Thank you for contributing to the open-source community with your clear explanation.

I hope that we can get a grateful fruit in Korean open-source LLM.

Best Regards

Sign up or log in to comment