How was this model trained?

#1
by BramVanroy - opened

I'd love to play around with a smaller version of mbart locally for debugging, so this tiny mbart sounds promising! Can you give more details about how this was trained/distilled? Data used, hyperparameters, etc.

Thanks!

at least looking at the demo, it doesn't seem promising

I think I read somewhere that the model was just randomly initialized and not trained at all, but I do not remember whether this occurred to me in a dream or real life.

Sign up or log in to comment