id2label and label2id are incompatible with multi_nli dataset

#3
by kslnet - opened

Hi, the id2label and label2id in config.json are:

"id2label": {
"0": "CONTRADICTION",
"1": "NEUTRAL",
"2": "ENTAILMENT"
},
"label2id": {
"CONTRADICTION": 0,
"NEUTRAL": 1,
"ENTAILMENT": 2
},

However, according to the multi_nli dataset (https://huggingface.co/datasets/multi_nli) , 0 should be mapped to "ENTAILMENT" and 2 to "CONTRADICTION":

"label: a classification label, with possible values including entailment (0), neutral (1), contradiction (2)."

Therefore, I believe id2label and label2id need to be corrected?

No, because the model was trained with the labels in this order.

But don't the labels need to correspond to the way the data was annotated?

Are you saying that you did not use the multi_nli data as-is, but reversed the labels before using it to fine-tune roberta-large? Sorry if I am misunderstanding something basic.

I'm saying the model was trained to predict 0 for contradiction, 1 for neutral and 2 for entailment.

Sign up or log in to comment