zero-shot classification pipeline and manual pytorch versions are different results

#3
by james92 - opened

Hi, Thanks for the model. Correct me if I am wrong please. I have picked both the versions ie. code under zero-shot classification pipeline and the code under manual pytorch versions and run against the labels ['Positive','Neutral','Negative'] for the sequence one day I will see the world. Below are the results.

Results (from zero-shot classification pipeline)
{'sequence': 'one day I will see the world', 'labels': ['Positive', 'Negative', 'Neutral'], 'scores': [0.48784172534942627, 0.26007547974586487, 0.25208279490470886]}

Results (from Manual Pytorch Version; For the label 'Positive'}
tensor([0.2946], grad_fn=<SelectBackward0>)

If you notice from the both the results for the label positive, there is a huge variation. I ran the exact same code given in model page in order to test it. I am doing anything wrong ?. Please help me. Thank you.

Extra Information
The logit values from Method Manual Pytorch after applying softmax
tensor([[0.0874, 0.8761, 0.0365]], grad_fn=<SoftmaxBackward0>)

Same, @james92 were you able to solve it ?

Sorry.No I couldn't. How about you ?

Would you be able to share your code, please, @james92 ? It seems hard to debug without it. Thanks! :)

Hi, I think I found sth from the config of model. Just print model.config and you will see following:

model config: BartConfig {
"_name_or_path": "facebook/bart-large-mnli",
"_num_labels": 3,
"activation_dropout": 0.0,
"activation_function": "gelu",
"add_final_layer_norm": false,
"architectures": [
"BartForSequenceClassification"
],
"attention_dropout": 0.0,
"bos_token_id": 0,
"classif_dropout": 0.0,
"classifier_dropout": 0.0,
"d_model": 1024,
"decoder_attention_heads": 16,
"decoder_ffn_dim": 4096,
"decoder_layerdrop": 0.0,
"decoder_layers": 12,
"decoder_start_token_id": 2,
"dropout": 0.1,
"encoder_attention_heads": 16,
"encoder_ffn_dim": 4096,
"encoder_layerdrop": 0.0,
"encoder_layers": 12,
"eos_token_id": 2,
"forced_eos_token_id": 2,
"gradient_checkpointing": false,
"id2label": {
"0": "contradiction",
"1": "neutral",
"2": "entailment"
},
"init_std": 0.02,
"is_encoder_decoder": true,
"label2id": {
"contradiction": 0,
"entailment": 2,
"neutral": 1
},
"max_position_embeddings": 1024,
"model_type": "bart",
"normalize_before": false,
"num_hidden_layers": 12,
"output_past": false,
"pad_token_id": 1,
"scale_embedding": false,
"transformers_version": "4.27.3",
"use_cache": true,
"vocab_size": 50265
}

As you can see, the label default is,
"0": "contradiction",
"1": "neutral",
"2": "entailment"

which means the highest probability in your case is actually "neutral".

Try to change the hypothesis to 'This example is positive.'
You may find that the prob of entailment will be the highest one.

Sign up or log in to comment