Phenomenon of saturation not reached?

#6
by DrNicefellow - opened

As studying the phenomenon of saturation is one purpose of training TinyLlama, and the saturation seems not reached with the 3T tokens. Do you think it's reasonable to give it further training until saturation? If doing so, careful choices on learning rate could be important.

Sign up or log in to comment