Report for siebert/sentiment-roberta-large-english
#97
by
giskard-bot
- opened
Hi Team,
This is a report from Giskard Bot Scan 🐢.
We have identified 11 potential vulnerabilities in your model based on an automated scan.
This automated analysis evaluated the model on the dataset sst2 (subset default
, split validation
).
👉Robustness issues (2)
Vulnerability | Level | Data slice | Metric | Transformation | Deviation |
---|---|---|---|---|---|
Robustness | major 🔴 | — | Fail rate = 0.104 | Transform to uppercase | 91/872 tested samples (10.44%) changed prediction after perturbation |
🔍✨Examples
When feature “text” is perturbed with the transformation “Transform to uppercase”, the model changes its prediction in 10.44% of the cases. We expected the predictions not to be affected by this transformation.text | Transform to uppercase(text) | Original prediction | Prediction after perturbation | |
---|---|---|---|---|
1 | unflinchingly bleak and desperate | UNFLINCHINGLY BLEAK AND DESPERATE | POSITIVE (p = 0.99) | NEGATIVE (p = 1.00) |
6 | a sometimes tedious film . | A SOMETIMES TEDIOUS FILM . | NEGATIVE (p = 1.00) | POSITIVE (p = 0.99) |
20 | pumpkin takes an admirable look at the hypocrisy of political correctness , but it does so with such an uneven tone that you never know when humor ends and tragedy begins . | PUMPKIN TAKES AN ADMIRABLE LOOK AT THE HYPOCRISY OF POLITICAL CORRECTNESS , BUT IT DOES SO WITH SUCH AN UNEVEN TONE THAT YOU NEVER KNOW WHEN HUMOR ENDS AND TRAGEDY BEGINS . | NEGATIVE (p = 1.00) | POSITIVE (p = 1.00) |
Vulnerability | Level | Data slice | Metric | Transformation | Deviation |
---|---|---|---|---|---|
Robustness | medium 🟡 | — | Fail rate = 0.074 | Add typos | 59/800 tested samples (7.37%) changed prediction after perturbation |
🔍✨Examples
When feature “text” is perturbed with the transformation “Add typos”, the model changes its prediction in 7.37% of the cases. We expected the predictions not to be affected by this transformation.text | Add typos(text) | Original prediction | Prediction after perturbation | |
---|---|---|---|---|
7 | or doing last year 's taxes with your ex-wife . | od doing last year 's taxes with your ex-wicfw . | NEGATIVE (p = 0.99) | POSITIVE (p = 0.99) |
22 | holden caulfield did it better . | holdsn caulfkeld did t better . | POSITIVE (p = 1.00) | NEGATIVE (p = 0.99) |
33 | if the movie succeeds in instilling a wary sense of ` there but for the grace of god , ' it is far too self-conscious to draw you deeply into its world . | if the mofvie succeeds in instilling a wary sense of ` gthere but got the grace f god , ' it is far topo self-conscious to draw ou deeply intk its world | NEGATIVE (p = 1.00) | POSITIVE (p = 0.99) |
👉Performance issues (9)
Vulnerability | Level | Data slice | Metric | Transformation | Deviation |
---|---|---|---|---|---|
Performance | major 🔴 | text_length(text) < 63.500 AND text_length(text) >= 53.500 |
Precision = 0.714 | — | -22.36% than global |
🔍✨Examples
For records in the dataset where `text_length(text)` < 63.500 AND `text_length(text)` >= 53.500, the Precision is 22.36% lower than the global Precision.text | text_length(text) | label | Predicted label |
|
---|---|---|---|---|
21 | the iditarod lasts for days - this just felt like it did . | 59 | NEGATIVE | POSITIVE (p = 1.00) |
58 | manages to be both repulsively sadistic and mundane . | 54 | NEGATIVE | POSITIVE (p = 0.98) |
92 | you wo n't like roger , but you will quickly recognize him . | 61 | NEGATIVE | POSITIVE (p = 1.00) |
Vulnerability | Level | Data slice | Metric | Transformation | Deviation |
---|---|---|---|---|---|
Performance | major 🔴 | avg_word_length(text) >= 4.632 AND avg_word_length(text) < 4.726 |
Recall = 0.769 | — | -17.50% than global |
🔍✨Examples
For records in the dataset where `avg_word_length(text)` >= 4.632 AND `avg_word_length(text)` < 4.726, the Recall is 17.5% lower than the global Recall.text | avg_word_length(text) | label | Predicted label |
|
---|---|---|---|---|
87 | jaglom ... put ( s ) the audience in the privileged position of eavesdropping on his characters | 4.64706 | POSITIVE | NEGATIVE (p = 1.00) |
282 | while there 's something intrinsically funny about sir anthony hopkins saying ` get in the car , bitch , ' this jerry bruckheimer production has little else to offer | 4.72414 | POSITIVE | NEGATIVE (p = 1.00) |
546 | on the heels of the ring comes a similarly morose and humorless horror movie that , although flawed , is to be commended for its straight-ahead approach to creepiness . | 4.63333 | POSITIVE | NEGATIVE (p = 1.00) |
Vulnerability | Level | Data slice | Metric | Transformation | Deviation |
---|---|---|---|---|---|
Performance | major 🔴 | avg_whitespace(text) < 0.178 AND avg_whitespace(text) >= 0.175 |
Recall = 0.769 | — | -17.50% than global |
🔍✨Examples
For records in the dataset where `avg_whitespace(text)` < 0.178 AND `avg_whitespace(text)` >= 0.175, the Recall is 17.5% lower than the global Recall.text | avg_whitespace(text) | label | Predicted label |
|
---|---|---|---|---|
87 | jaglom ... put ( s ) the audience in the privileged position of eavesdropping on his characters | 0.177083 | POSITIVE | NEGATIVE (p = 1.00) |
282 | while there 's something intrinsically funny about sir anthony hopkins saying ` get in the car , bitch , ' this jerry bruckheimer production has little else to offer | 0.174699 | POSITIVE | NEGATIVE (p = 1.00) |
546 | on the heels of the ring comes a similarly morose and humorless horror movie that , although flawed , is to be commended for its straight-ahead approach to creepiness . | 0.177515 | POSITIVE | NEGATIVE (p = 1.00) |
Vulnerability | Level | Data slice | Metric | Transformation | Deviation |
---|---|---|---|---|---|
Performance | major 🔴 | text_length(text) >= 163.500 AND text_length(text) < 179.500 |
Recall = 0.812 | — | -12.86% than global |
🔍✨Examples
For records in the dataset where `text_length(text)` >= 163.500 AND `text_length(text)` < 179.500, the Recall is 12.86% lower than the global Recall.text | text_length(text) | label | Predicted label |
|
---|---|---|---|---|
166 | characters still need to function according to some set of believable and comprehensible impulses , no matter how many drugs they do or how much artistic license avary employs . | 178 | NEGATIVE | POSITIVE (p = 0.99) |
266 | a coda in every sense , the pinochet case splits time between a minute-by-minute account of the british court 's extradition chess game and the regime 's talking-head survivors . | 179 | POSITIVE | NEGATIVE (p = 0.95) |
282 | while there 's something intrinsically funny about sir anthony hopkins saying ` get in the car , bitch , ' this jerry bruckheimer production has little else to offer | 166 | POSITIVE | NEGATIVE (p = 1.00) |
Vulnerability | Level | Data slice | Metric | Transformation | Deviation |
---|---|---|---|---|---|
Performance | medium 🟡 | text_length(text) < 93.500 AND text_length(text) >= 86.500 |
Precision = 0.857 | — | -6.83% than global |
🔍✨Examples
For records in the dataset where `text_length(text)` < 93.500 AND `text_length(text)` >= 86.500, the Precision is 6.83% lower than the global Precision.text | text_length(text) | label | Predicted label |
|
---|---|---|---|---|
102 | does paint some memorable images ... , but makhmalbaf keeps her distance from the characters | 93 | POSITIVE | NEGATIVE (p = 1.00) |
115 | sam mendes has become valedictorian at the school for soft landings and easy ways out . | 88 | NEGATIVE | POSITIVE (p = 1.00) |
519 | moretti 's compelling anatomy of grief and the difficult process of adapting to loss . | 87 | NEGATIVE | POSITIVE (p = 1.00) |
Vulnerability | Level | Data slice | Metric | Transformation | Deviation |
---|---|---|---|---|---|
Performance | medium 🟡 | text_length(text) >= 140.500 AND text_length(text) < 154.500 |
Precision = 0.862 | — | -6.30% than global |
🔍✨Examples
For records in the dataset where `text_length(text)` >= 140.500 AND `text_length(text)` < 154.500, the Precision is 6.3% lower than the global Precision.text | text_length(text) | label | Predicted label |
|
---|---|---|---|---|
95 | this riveting world war ii moral suspense story deals with the shadow side of american culture : racial prejudice in its ugly and diverse forms . | 146 | NEGATIVE | POSITIVE (p = 1.00) |
147 | the talented and clever robert rodriguez perhaps put a little too much heart into his first film and did n't reserve enough for his second . | 141 | NEGATIVE | POSITIVE (p = 0.98) |
494 | it showcases carvey 's talent for voices , but not nearly enough and not without taxing every drop of one 's patience to get to the good stuff . | 145 | NEGATIVE | POSITIVE (p = 0.98) |
Vulnerability | Level | Data slice | Metric | Transformation | Deviation |
---|---|---|---|---|---|
Performance | medium 🟡 | text_length(text) < 53.500 AND text_length(text) >= 46.500 |
Recall = 0.875 | — | -6.16% than global |
🔍✨Examples
For records in the dataset where `text_length(text)` < 53.500 AND `text_length(text)` >= 46.500, the Recall is 6.16% lower than the global Recall.text | text_length(text) | label | Predicted label |
|
---|---|---|---|---|
295 | jones ... does offer a brutal form of charisma . | 49 | POSITIVE | NEGATIVE (p = 0.99) |
436 | trite , banal , cliched , mostly inoffensive . | 47 | NEGATIVE | POSITIVE (p = 0.99) |
602 | instead , he shows them the respect they are due . | 51 | POSITIVE | NEGATIVE (p = 1.00) |
Vulnerability | Level | Data slice | Metric | Transformation | Deviation |
---|---|---|---|---|---|
Performance | medium 🟡 | avg_word_length(text) >= 4.509 AND avg_word_length(text) < 4.632 |
Precision = 0.871 | — | -5.33% than global |
🔍✨Examples
For records in the dataset where `avg_word_length(text)` >= 4.509 AND `avg_word_length(text)` < 4.632, the Precision is 5.33% lower than the global Precision.text | avg_word_length(text) | label | Predicted label |
|
---|---|---|---|---|
95 | this riveting world war ii moral suspense story deals with the shadow side of american culture : racial prejudice in its ugly and diverse forms . | 4.61538 | NEGATIVE | POSITIVE (p = 1.00) |
218 | all that 's missing is the spontaneity , originality and delight . | 4.58333 | NEGATIVE | POSITIVE (p = 0.95) |
300 | fun , flip and terribly hip bit of cinematic entertainment . | 4.54545 | POSITIVE | NEGATIVE (p = 1.00) |
Vulnerability | Level | Data slice | Metric | Transformation | Deviation |
---|---|---|---|---|---|
Performance | medium 🟡 | avg_whitespace(text) < 0.182 AND avg_whitespace(text) >= 0.178 |
Precision = 0.871 | — | -5.33% than global |
🔍✨Examples
For records in the dataset where `avg_whitespace(text)` < 0.182 AND `avg_whitespace(text)` >= 0.178, the Precision is 5.33% lower than the global Precision.text | avg_whitespace(text) | label | Predicted label |
|
---|---|---|---|---|
95 | this riveting world war ii moral suspense story deals with the shadow side of american culture : racial prejudice in its ugly and diverse forms . | 0.178082 | NEGATIVE | POSITIVE (p = 1.00) |
218 | all that 's missing is the spontaneity , originality and delight . | 0.179104 | NEGATIVE | POSITIVE (p = 0.95) |
300 | fun , flip and terribly hip bit of cinematic entertainment . | 0.180328 | POSITIVE | NEGATIVE (p = 1.00) |
Checkout out the Giskard Space and test your model.
Disclaimer: it's important to note that automated scans may produce false positives or miss certain vulnerabilities. We encourage you to review the findings and assess the impact accordingly.