Initialization Refactor

#1
by MaxiBoether - opened

Thank you so much for providing the Mistral implementation here for Nanotron! Unfortunately, on the latest commits of Nanotron, there seems to be a refactor on how the weights are initialized. The Trainer calls init_model_randomly now with a config object, instead of an initialization method. Do you have any plans of updating the implementation here to the latest Nanotron commit, or should we not expect that soon?

Thank you so much for the info!

Sign up or log in to comment