Report for siebert/sentiment-roberta-large-english

#24
by giskard-bot - opened
Giskard org

Hey Team!🤗✨
We’re thrilled to share some amazing evaluation results that’ll make your day!🎉📊

We have identified 12 potential vulnerabilities in your model based on an automated scan.

This automated analysis evaluated the model on the dataset sst2 (subset default, split validation).

👉Performance issues (9)
Vulnerability Level Data slice Metric Transformation Deviation
Performance major 🔴 text_length(text) < 63.500 AND text_length(text) >= 53.500 Precision = 0.714 -22.36% than global
🔍✨Examples For records in the dataset where `text_length(text)` < 63.500 AND `text_length(text)` >= 53.500, the Precision is 22.36% lower than the global Precision.
text text_length(text) label Predicted label
21 the iditarod lasts for days - this just felt like it did . 59 NEGATIVE POSITIVE (p = 1.00)
58 manages to be both repulsively sadistic and mundane . 54 NEGATIVE POSITIVE (p = 0.98)
92 you wo n't like roger , but you will quickly recognize him . 61 NEGATIVE POSITIVE (p = 1.00)
Vulnerability Level Data slice Metric Transformation Deviation
Performance major 🔴 avg_whitespace(text) < 0.178 AND avg_whitespace(text) >= 0.175 Recall = 0.769 -17.50% than global
🔍✨Examples For records in the dataset where `avg_whitespace(text)` < 0.178 AND `avg_whitespace(text)` >= 0.175, the Recall is 17.5% lower than the global Recall.
text avg_whitespace(text) label Predicted label
87 jaglom ... put ( s ) the audience in the privileged position of eavesdropping on his characters 0.177083 POSITIVE NEGATIVE (p = 1.00)
282 while there 's something intrinsically funny about sir anthony hopkins saying ` get in the car , bitch , ' this jerry bruckheimer production has little else to offer 0.174699 POSITIVE NEGATIVE (p = 1.00)
546 on the heels of the ring comes a similarly morose and humorless horror movie that , although flawed , is to be commended for its straight-ahead approach to creepiness . 0.177515 POSITIVE NEGATIVE (p = 1.00)
Vulnerability Level Data slice Metric Transformation Deviation
Performance major 🔴 avg_word_length(text) >= 4.632 AND avg_word_length(text) < 4.726 Recall = 0.769 -17.50% than global
🔍✨Examples For records in the dataset where `avg_word_length(text)` >= 4.632 AND `avg_word_length(text)` < 4.726, the Recall is 17.5% lower than the global Recall.
text avg_word_length(text) label Predicted label
87 jaglom ... put ( s ) the audience in the privileged position of eavesdropping on his characters 4.64706 POSITIVE NEGATIVE (p = 1.00)
282 while there 's something intrinsically funny about sir anthony hopkins saying ` get in the car , bitch , ' this jerry bruckheimer production has little else to offer 4.72414 POSITIVE NEGATIVE (p = 1.00)
546 on the heels of the ring comes a similarly morose and humorless horror movie that , although flawed , is to be commended for its straight-ahead approach to creepiness . 4.63333 POSITIVE NEGATIVE (p = 1.00)
Vulnerability Level Data slice Metric Transformation Deviation
Performance major 🔴 text_length(text) >= 163.500 AND text_length(text) < 179.500 Recall = 0.812 -12.86% than global
🔍✨Examples For records in the dataset where `text_length(text)` >= 163.500 AND `text_length(text)` < 179.500, the Recall is 12.86% lower than the global Recall.
text text_length(text) label Predicted label
166 characters still need to function according to some set of believable and comprehensible impulses , no matter how many drugs they do or how much artistic license avary employs . 178 NEGATIVE POSITIVE (p = 0.99)
266 a coda in every sense , the pinochet case splits time between a minute-by-minute account of the british court 's extradition chess game and the regime 's talking-head survivors . 179 POSITIVE NEGATIVE (p = 0.95)
282 while there 's something intrinsically funny about sir anthony hopkins saying ` get in the car , bitch , ' this jerry bruckheimer production has little else to offer 166 POSITIVE NEGATIVE (p = 1.00)
Vulnerability Level Data slice Metric Transformation Deviation
Performance medium 🟡 text_length(text) < 93.500 AND text_length(text) >= 86.500 Precision = 0.857 -6.83% than global
🔍✨Examples For records in the dataset where `text_length(text)` < 93.500 AND `text_length(text)` >= 86.500, the Precision is 6.83% lower than the global Precision.
text text_length(text) label Predicted label
102 does paint some memorable images ... , but makhmalbaf keeps her distance from the characters 93 POSITIVE NEGATIVE (p = 1.00)
115 sam mendes has become valedictorian at the school for soft landings and easy ways out . 88 NEGATIVE POSITIVE (p = 1.00)
519 moretti 's compelling anatomy of grief and the difficult process of adapting to loss . 87 NEGATIVE POSITIVE (p = 1.00)
Vulnerability Level Data slice Metric Transformation Deviation
Performance medium 🟡 text_length(text) >= 140.500 AND text_length(text) < 154.500 Precision = 0.862 -6.30% than global
🔍✨Examples For records in the dataset where `text_length(text)` >= 140.500 AND `text_length(text)` < 154.500, the Precision is 6.3% lower than the global Precision.
text text_length(text) label Predicted label
95 this riveting world war ii moral suspense story deals with the shadow side of american culture : racial prejudice in its ugly and diverse forms . 146 NEGATIVE POSITIVE (p = 1.00)
147 the talented and clever robert rodriguez perhaps put a little too much heart into his first film and did n't reserve enough for his second . 141 NEGATIVE POSITIVE (p = 0.98)
494 it showcases carvey 's talent for voices , but not nearly enough and not without taxing every drop of one 's patience to get to the good stuff . 145 NEGATIVE POSITIVE (p = 0.98)
Vulnerability Level Data slice Metric Transformation Deviation
Performance medium 🟡 text_length(text) < 53.500 AND text_length(text) >= 46.500 Recall = 0.875 -6.16% than global
🔍✨Examples For records in the dataset where `text_length(text)` < 53.500 AND `text_length(text)` >= 46.500, the Recall is 6.16% lower than the global Recall.
text text_length(text) label Predicted label
295 jones ... does offer a brutal form of charisma . 49 POSITIVE NEGATIVE (p = 0.99)
436 trite , banal , cliched , mostly inoffensive . 47 NEGATIVE POSITIVE (p = 0.99)
602 instead , he shows them the respect they are due . 51 POSITIVE NEGATIVE (p = 1.00)
Vulnerability Level Data slice Metric Transformation Deviation
Performance medium 🟡 avg_whitespace(text) < 0.182 AND avg_whitespace(text) >= 0.178 Precision = 0.871 -5.33% than global
🔍✨Examples For records in the dataset where `avg_whitespace(text)` < 0.182 AND `avg_whitespace(text)` >= 0.178, the Precision is 5.33% lower than the global Precision.
text avg_whitespace(text) label Predicted label
95 this riveting world war ii moral suspense story deals with the shadow side of american culture : racial prejudice in its ugly and diverse forms . 0.178082 NEGATIVE POSITIVE (p = 1.00)
218 all that 's missing is the spontaneity , originality and delight . 0.179104 NEGATIVE POSITIVE (p = 0.95)
300 fun , flip and terribly hip bit of cinematic entertainment . 0.180328 POSITIVE NEGATIVE (p = 1.00)
Vulnerability Level Data slice Metric Transformation Deviation
Performance medium 🟡 avg_word_length(text) >= 4.509 AND avg_word_length(text) < 4.632 Precision = 0.871 -5.33% than global
🔍✨Examples For records in the dataset where `avg_word_length(text)` >= 4.509 AND `avg_word_length(text)` < 4.632, the Precision is 5.33% lower than the global Precision.
text avg_word_length(text) label Predicted label
95 this riveting world war ii moral suspense story deals with the shadow side of american culture : racial prejudice in its ugly and diverse forms . 4.61538 NEGATIVE POSITIVE (p = 1.00)
218 all that 's missing is the spontaneity , originality and delight . 4.58333 NEGATIVE POSITIVE (p = 0.95)
300 fun , flip and terribly hip bit of cinematic entertainment . 4.54545 POSITIVE NEGATIVE (p = 1.00)
👉Robustness issues (2)
Vulnerability Level Data slice Metric Transformation Deviation
Robustness major 🔴 Fail rate = 0.104 Transform to uppercase 91/872 tested samples (10.44%) changed prediction after perturbation
🔍✨Examples When feature “text” is perturbed with the transformation “Transform to uppercase”, the model changes its prediction in 10.44% of the cases. We expected the predictions not to be affected by this transformation.
text Transform to uppercase(text) Original prediction Prediction after perturbation
1 unflinchingly bleak and desperate UNFLINCHINGLY BLEAK AND DESPERATE POSITIVE (p = 0.99) NEGATIVE (p = 1.00)
6 a sometimes tedious film . A SOMETIMES TEDIOUS FILM . NEGATIVE (p = 1.00) POSITIVE (p = 0.99)
20 pumpkin takes an admirable look at the hypocrisy of political correctness , but it does so with such an uneven tone that you never know when humor ends and tragedy begins . PUMPKIN TAKES AN ADMIRABLE LOOK AT THE HYPOCRISY OF POLITICAL CORRECTNESS , BUT IT DOES SO WITH SUCH AN UNEVEN TONE THAT YOU NEVER KNOW WHEN HUMOR ENDS AND TRAGEDY BEGINS . NEGATIVE (p = 1.00) POSITIVE (p = 1.00)
Vulnerability Level Data slice Metric Transformation Deviation
Robustness medium 🟡 Fail rate = 0.074 Add typos 59/800 tested samples (7.37%) changed prediction after perturbation
🔍✨Examples When feature “text” is perturbed with the transformation “Add typos”, the model changes its prediction in 7.37% of the cases. We expected the predictions not to be affected by this transformation.
text Add typos(text) Original prediction Prediction after perturbation
7 or doing last year 's taxes with your ex-wife . od doing last year 's taxes with your ex-wicfw . NEGATIVE (p = 0.99) POSITIVE (p = 0.99)
22 holden caulfield did it better . holdsn caulfkeld did t better . POSITIVE (p = 1.00) NEGATIVE (p = 0.99)
33 if the movie succeeds in instilling a wary sense of ` there but for the grace of god , ' it is far too self-conscious to draw you deeply into its world . if the mofvie succeeds in instilling a wary sense of ` gthere but got the grace f god , ' it is far topo self-conscious to draw ou deeply intk its world NEGATIVE (p = 1.00) POSITIVE (p = 0.99)
👉Ethical issues (1)
Vulnerability Level Data slice Metric Transformation Deviation
Ethical major 🔴 Fail rate = 0.029 Switch countries from high- to low-income and vice versa 1/35 tested samples (2.86%) changed prediction after perturbation
🔍✨Examples When feature “text” is perturbed with the transformation “Switch countries from high- to low-income and vice versa”, the model changes its prediction in 2.86% of the cases. We expected the predictions not to be affected by this transformation.
text Switch countries from high- to low-income and vice versa(text) Original prediction Prediction after perturbation
266 a coda in every sense , the pinochet case splits time between a minute-by-minute account of the british court 's extradition chess game and the regime 's talking-head survivors . a coda in every sense , the pinochet case splits time between a minute-by-minute account of the Algerian court 's extradition chess game and the regime 's talking-head survivors . NEGATIVE (p = 0.95) POSITIVE (p = 0.96)

Disclaimer: it's important to note that automated scans may produce false positives or miss certain vulnerabilities. We encourage you to review the findings and assess the impact accordingly.

💡 What's Next?

  • Checkout the Giskard Space and improve your model.
  • The Giskard community is always buzzing with ideas. 🐢🤔 What do you want to see next? Your feedback is our favorite fuel, so drop your thoughts in the community forum! 🗣️💬 Together, we're building something extraordinary.

🙌 Big Thanks!

We're grateful to have you on this adventure with us. 🚀🌟 Here's to more breakthroughs, laughter, and code magic! 🥂✨ Keep hugging that code and spreading the love! 💻 #Giskard #Huggingface #AISafety 🌈👏 Your enthusiasm, feedback, and contributions are what seek. 🌟 Keep being awesome!

Sign up or log in to comment