How to finetune this model on RTE, MRPC and SST datasets in GLUE benchmark?

#9
by zhai1010 - opened

I'm trying to reproduce the experiment results in Roberta paper. As the authors said "For RTE, STS and MRPC we finetune starting from the MNLI model instead of the baseline pretrained model", I tried to load this pre-trained model and finetune on these 3 datasets. However, since the num of labels of the datasets are different, this "roberta-large-mnli" model has a classifier with the shape of 3 outputs (where in RTE datasets the output shape should be 2).
So I use ignore_mismatched_sizes=True in AutoModelForSequenceClassification.from_pretrained() and initialized new classifier weights and bias as model.classifier.out_proj = nn.Linear(model.classifier.out_proj.in_features, model_config.num_labels), model.classifier.out_bias = nn.Parameter(torch.zeros(model_config.num_labels)) and then finetune on the 3 GLUE tasks.
Am I doing this right? Has anyone done a similar task like me? Please leave your valuable comments here.

Facebook AI community org

This seems reasonable ! You could also load it in the base AutoModel, save it once again (with save_pretrained) in order to remove the classification layer, and reload it finally.

Sign up or log in to comment