Does your fine-tuning process overfit?

#15
by jiaxiangc - opened

Thanks for your contribution.
After I fine tuning LLaMA-13B on OpenOrca or SlimOrca, I want to ask two questions.

  1. What is your training configuration? Such as, GPU numers, learning rate, fine tune strategy and epoch numbers.
  2. Does your fine-tuning process overfit? When i start the second epoch, the training loss dropped significantly. Is this normal? Do you have any suggestions to avoid this problem?

@jiaxiangc

For the compute config, it is 8x a6000 gpus, rented from runpod.io. To prevent overfitting we use packing, which also will speed up training a considerable amount. As far as that trainer we use, it is called axolotl, and you can find it here https://github.com/OpenAccess-AI-Collective/axolotl. for learning rate and all other config options, in the configs folder on each model there is a yaml file which details all the options which axolotl uses.

Hope that helps!

Thanks for storing the axolotl config! I suggest you add this to the model card so that people know where to find it :] just my 2c

Sign up or log in to comment