Dropout

#116
by Muennighoff - opened
BigScience Workshop org

Shouldn't the dropouts in the config be 0.1, as the model was pre-trained with dropout @TimeRobber @ybelkada ?

BigScience Workshop org

I don't know about this. I think this depends on what we want those configs to reflect:

  • training procedure? In that sense yes we did use dropout 0.1 so we can update those
  • best training procedure? My strong intuition is that we shouldn't have used dropout. Palm didn't set it for example.
  • best config for finetuning? I think in this case we've seen that dropout has substantial impact on downstream tasks: https://arxiv.org/abs/2204.05832
BigScience Workshop org

I think either 1) or 3), so we should change the config, no?
2) could be the default parameters in transformers, but not for a model on the hub imo when it was trained differently

BigScience Workshop org

No strong opinion, but I feel this should already be answered somewhere. cc @patrickvonplaten

BigScience Workshop org
edited Sep 29, 2022

I second what @TimeRobber said, I don't have any strong opinion on that. But would be nice if we can update it with the parameter used for training, ie, 0.1 to make the config file reflect the parameters used for the training

Sign up or log in to comment