very interesting. you only trained the last 3 layers and the lm_head?
thank you for your interest! We only train the last four layers and the lm_head. All other layers are fixed.
· Sign up or log in to comment