The Stickiness Problem

by deleted - opened May 10, 2023

deleted

May 10, 2023

https://huggingface.co/ehartford/WizardLM-7B-Uncensored/discussions/10

@reeducator Just linking this thread from Henk about overtraining. Some extra reading direction for model stickiness assuming you haven't been messing with dials and knobs already around these things.

reeducator

Owner May 10, 2023

Thanks @gozfarb. Our hyperparams are still default from the vicuna branch. Learning rate is 2e-5 with cosine scheduling, i.e. the LR goes orders of magnitude lower towards the end of the training. We might keep it as it is for now, unless someone points out that the model we have here suffers from similar issues. For Vicuna I can't lower it much, since the training already takes quite some time, but if necessary, there's still room for longer training periods.

deleted changed discussion title from The Stickiness Promblem to The Stickiness Problem May 13, 2023

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment