Report for JiaqiLee/imdb-finetuned-bert-base-uncased

#95
by giskard-bot - opened

Hi Team,

This is a report from Giskard Bot Scan 🐢.

We have identified 10 potential vulnerabilities in your model based on an automated scan.

This automated analysis evaluated the model on the dataset sst2 (subset default, split validation).

👉Robustness issues (2)
Vulnerability Level Data slice Metric Transformation Deviation
Robustness major 🔴 Fail rate = 0.125 Add typos 100/800 tested samples (12.5%) changed prediction after perturbation
🔍✨Examples When feature “text” is perturbed with the transformation “Add typos”, the model changes its prediction in 12.5% of the cases. We expected the predictions not to be affected by this transformation.
text Add typos(text) Original prediction Prediction after perturbation
11 it takes a strange kind of laziness to waste the talents of robert forster , anne meara , eugene levy , and reginald veljohnson all in the same movie . it takes a strange kind of laziness to wazte the talwnts of robert forster , anne meara , eugene levy , and rebinald veljohnson all in the same movie .. negative (p = 1.00) positive (p = 1.00)
21 the iditarod lasts for days - this just felt like it did . the irditarod lasts for days - this just felt ike it did . negative (p = 0.96) positive (p = 0.97)
22 holden caulfield did it better . holdsn caulfkeld did t better . positive (p = 0.97) negative (p = 0.99)
Vulnerability Level Data slice Metric Transformation Deviation
Robustness medium 🟡 Fail rate = 0.059 Punctuation Removal 51/866 tested samples (5.89%) changed prediction after perturbation
🔍✨Examples When feature “text” is perturbed with the transformation “Punctuation Removal”, the model changes its prediction in 5.89% of the cases. We expected the predictions not to be affected by this transformation.
text Punctuation Removal(text) Original prediction Prediction after perturbation
4 it 's slow -- very , very slow . it s slow very very slow positive (p = 0.52) negative (p = 0.77)
33 if the movie succeeds in instilling a wary sense of ` there but for the grace of god , ' it is far too self-conscious to draw you deeply into its world . if the movie succeeds in instilling a wary sense of there but for the grace of god it is far too self conscious to draw you deeply into its world negative (p = 1.00) positive (p = 0.99)
66 if you 're hard up for raunchy college humor , this is your ticket right here . if you re hard up for raunchy college humor this is your ticket right here positive (p = 0.89) negative (p = 0.57)
👉Performance issues (8)
Vulnerability Level Data slice Metric Transformation Deviation
Performance major 🔴 text_length(text) < 89.500 AND text_length(text) >= 80.500 Precision = 0.719 -15.79% than global
🔍✨Examples For records in the dataset where `text_length(text)` < 89.500 AND `text_length(text)` >= 80.500, the Precision is 15.79% lower than the global Precision.
text text_length(text) label Predicted label
115 sam mendes has become valedictorian at the school for soft landings and easy ways out . 88 negative positive (p = 0.95)
142 what better message than ` love thyself ' could young women of any size receive ? 82 positive negative (p = 1.00)
286 at its best , queen is campy fun like the vincent price horror classics of the '60s . 86 positive negative (p = 1.00)
Vulnerability Level Data slice Metric Transformation Deviation
Performance medium 🟡 avg_word_length(text) >= 5.511 Recall = 0.844 -6.81% than global
🔍✨Examples For records in the dataset where `avg_word_length(text)` >= 5.511, the Recall is 6.81% lower than the global Recall.
text avg_word_length(text) label Predicted label
1 unflinchingly bleak and desperate 7.5 negative positive (p = 1.00)
68 good old-fashioned slash-and-hack is back ! 6.33333 positive negative (p = 0.60)
112 hilariously inept and ridiculous . 6 positive negative (p = 1.00)
Vulnerability Level Data slice Metric Transformation Deviation
Performance medium 🟡 avg_whitespace(text) < 0.154 Recall = 0.844 -6.81% than global
🔍✨Examples For records in the dataset where `avg_whitespace(text)` < 0.154, the Recall is 6.81% lower than the global Recall.
text avg_whitespace(text) label Predicted label
1 unflinchingly bleak and desperate 0.117647 negative positive (p = 1.00)
68 good old-fashioned slash-and-hack is back ! 0.136364 positive negative (p = 0.60)
112 hilariously inept and ridiculous . 0.142857 positive negative (p = 1.00)
Vulnerability Level Data slice Metric Transformation Deviation
Performance medium 🟡 avg_word_length(text) >= 4.354 AND avg_word_length(text) < 4.464 Precision = 0.800 -6.27% than global
🔍✨Examples For records in the dataset where `avg_word_length(text)` >= 4.354 AND `avg_word_length(text)` < 4.464, the Precision is 6.27% lower than the global Precision.
text avg_word_length(text) label Predicted label
86 the film flat lines when it should peak and is more missed opportunity and trifle than dark , decadent truffle . 4.38095 negative positive (p = 0.93)
147 the talented and clever robert rodriguez perhaps put a little too much heart into his first film and did n't reserve enough for his second . 4.42308 negative positive (p = 0.97)
448 something akin to a japanese alice through the looking glass , except that it seems to take itself far more seriously . 4.45455 positive negative (p = 0.84)
Vulnerability Level Data slice Metric Transformation Deviation
Performance medium 🟡 avg_whitespace(text) < 0.187 AND avg_whitespace(text) >= 0.183 Precision = 0.800 -6.27% than global
🔍✨Examples For records in the dataset where `avg_whitespace(text)` < 0.187 AND `avg_whitespace(text)` >= 0.183, the Precision is 6.27% lower than the global Precision.
text avg_whitespace(text) label Predicted label
86 the film flat lines when it should peak and is more missed opportunity and trifle than dark , decadent truffle . 0.185841 negative positive (p = 0.93)
147 the talented and clever robert rodriguez perhaps put a little too much heart into his first film and did n't reserve enough for his second . 0.184397 negative positive (p = 0.97)
448 something akin to a japanese alice through the looking glass , except that it seems to take itself far more seriously . 0.183333 positive negative (p = 0.84)
Vulnerability Level Data slice Metric Transformation Deviation
Performance medium 🟡 text_length(text) < 59.500 AND text_length(text) >= 50.500 Precision = 0.800 -6.27% than global
🔍✨Examples For records in the dataset where `text_length(text)` < 59.500 AND `text_length(text)` >= 50.500, the Precision is 6.27% lower than the global Precision.
text text_length(text) label Predicted label
139 it 's not the ultimate depression-era gangster movie . 55 negative positive (p = 0.98)
183 the lower your expectations , the more you 'll enjoy it . 58 negative positive (p = 0.99)
205 falls neatly into the category of good stupid fun . 52 positive negative (p = 0.92)
Vulnerability Level Data slice Metric Transformation Deviation
Performance medium 🟡 avg_word_length(text) >= 4.123 AND avg_word_length(text) < 4.209 Recall = 0.850 -6.12% than global
🔍✨Examples For records in the dataset where `avg_word_length(text)` >= 4.123 AND `avg_word_length(text)` < 4.209, the Recall is 6.12% lower than the global Recall.
text avg_word_length(text) label Predicted label
113 this movie is maddening . 4.2 negative positive (p = 1.00)
121 it seems to me the film is about the art of ripping people off without ever letting them consciously know you have done so 4.125 negative positive (p = 0.98)
142 what better message than ` love thyself ' could young women of any size receive ? 4.125 positive negative (p = 1.00)
Vulnerability Level Data slice Metric Transformation Deviation
Performance medium 🟡 avg_whitespace(text) < 0.195 AND avg_whitespace(text) >= 0.192 Recall = 0.850 -6.12% than global
🔍✨Examples For records in the dataset where `avg_whitespace(text)` < 0.195 AND `avg_whitespace(text)` >= 0.192, the Recall is 6.12% lower than the global Recall.
text avg_whitespace(text) label Predicted label
113 this movie is maddening . 0.192308 negative positive (p = 1.00)
121 it seems to me the film is about the art of ripping people off without ever letting them consciously know you have done so 0.195122 negative positive (p = 0.98)
142 what better message than ` love thyself ' could young women of any size receive ? 0.195122 positive negative (p = 1.00)

Checkout out the Giskard Space and test your model.

Disclaimer: it's important to note that automated scans may produce false positives or miss certain vulnerabilities. We encourage you to review the findings and assess the impact accordingly.

Sign up or log in to comment