Curious about Fine-tuning Methods

#4
by ElliottDyson - opened

Hello there, just a simple curiosity here, what were the factors that made it so that you chose (or ended up with) training for 2 epochs over your data?

DreamGen org

Regarding the number of epoch specifically, it's mostly based on my other trains of other models using this same dataset. The eval loss also flatlined (though might go lower for at least 1 more epoch) and the base model is quite amazing and I did not want to overwrite all of its other capabilities so to speak (even though my dataset is diverse, and has data beyond just writing).

So TL;DR: mostly vibes -- I can't afford to test things properly atm (do a sweep of different params, and compare based on end-to-end side-by-side comparison).

image.png

Regarding the number of epoch specifically, it's mostly based on my other trains of other models using this same dataset. The eval loss also flatlined (though might go lower for at least 1 more epoch) and the base model is quite amazing and I did not want to overwrite all of its other capabilities so to speak (even though my dataset is diverse, and has data beyond just writing).

So TL;DR: mostly vibes -- I can't afford to test things properly atm (do a sweep of different params, and compare based on end-to-end side-by-side comparison).

image.png

Definitely agree on the quality of the base model, the thought had occured that too much fine-tuning might lose some of its best aspects such as instruction following. Looking forward to seeing potential improvements in the future but totally understand the cost of training. Thanks for getting it out so quickly!

Sign up or log in to comment