Text Generation
Inference Endpoints

The Stickiness Problem

by deleted - opened


@reeducator Just linking this thread from Henk about overtraining. Some extra reading direction for model stickiness assuming you haven't been messing with dials and knobs already around these things.

Thanks @gozfarb. Our hyperparams are still default from the vicuna branch. Learning rate is 2e-5 with cosine scheduling, i.e. the LR goes orders of magnitude lower towards the end of the training. We might keep it as it is for now, unless someone points out that the model we have here suffers from similar issues. For Vicuna I can't lower it much, since the training already takes quite some time, but if necessary, there's still room for longer training periods.

deleted changed discussion title from The Stickiness Promblem to The Stickiness Problem

Sign up or log in to comment