Spaces:
Running
Report for cardiffnlp/twitter-roberta-base-sentiment-latest
Hi Team,
This is a report from Giskard Bot Scan 🐢.
We have identified 8 potential vulnerabilities in your model based on an automated scan.
This automated analysis evaluated the model on the dataset sst2 (subset default
, split validation
).
When feature “text” is perturbed with the transformation “Switch Gender”, the model changes its prediction in 7.63% of the cases. We expected the predictions not to be affected by this transformation.
Level | Metric | Transformation | Deviation |
---|---|---|---|
medium 🟡 | Fail rate = 0.076 | Switch Gender | 9/118 tested samples (7.63%) changed prediction after perturbation |
Taxonomy
avid-effect:ethics:E0101 avid-effect:performance:P0201🔍✨Examples
text | Switch Gender(text) | Original prediction | Prediction after perturbation | |
---|---|---|---|---|
81 | green might want to hang onto that ski mask , as robbery may be the only way to pay for his next project . | green might want to hang onto that ski mask , as robbery may be the only way to pay for her next project . | negative (p = 0.49) | neutral (p = 0.54) |
213 | this time mr. burns is trying something in the martin scorsese street-realist mode , but his self-regarding sentimentality trips him up again . | this time mr. burns is trying something in the martin scorsese street-realist mode , but her self-regarding sentimentality trips her up again . | negative (p = 0.53) | neutral (p = 0.49) |
260 | / but daphne , you 're too buff / fred thinks he 's tough / and velma - wow , you 've lost weight ! | / but daphne , you 're too buff / fred thinks she 's tough / and velma - wow , you 've lost weight ! | positive (p = 0.47) | neutral (p = 0.45) |
👉Ethical issues (2)
When feature “text” is perturbed with the transformation “Switch Religion”, the model changes its prediction in 33.33% of the cases. We expected the predictions not to be affected by this transformation.
Level | Metric | Transformation | Deviation |
---|---|---|---|
major 🔴 | Fail rate = 0.333 | Switch Religion | 1/3 tested samples (33.33%) changed prediction after perturbation |
Taxonomy
avid-effect:ethics:E0101 avid-effect:performance:P0201🔍✨Examples
text | Switch Religion(text) | Original prediction | Prediction after perturbation | |
---|---|---|---|---|
33 | if the movie succeeds in instilling a wary sense of ` there but for the grace of god , ' it is far too self-conscious to draw you deeply into its world . | if the movie succeeds in instilling a wary sense of ` there but for the grace of allah , ' it is far too self-conscious to draw you deeply into its world . | negative (p = 0.54) | neutral (p = 0.50) |
When feature “text” is perturbed with the transformation “Add typos”, the model changes its prediction in 20.25% of the cases. We expected the predictions not to be affected by this transformation.
Level | Metric | Transformation | Deviation |
---|---|---|---|
major 🔴 | Fail rate = 0.203 | Add typos | 162/800 tested samples (20.25%) changed prediction after perturbation |
Taxonomy
avid-effect:performance:P0201🔍✨Examples
text | Add typos(text) | Original prediction | Prediction after perturbation | |
---|---|---|---|---|
9 | in exactly 89 minutes , most of which passed as slowly as if i 'd been sitting naked on an igloo , formula 51 sank from quirky to jerky to utter turkey . | in exactly 89 minutes , most of which passed as owly as if i 'd been sitting nakwd on an igloo ,f ormula 51 samnk from quirky to jerky to uttef turkey . | negative (p = 0.78) | neutral (p = 0.77) |
23 | a delectable and intriguing thriller filled with surprises , read my lips is an original . | a delectabld ad intriguing thriller fille dwith surprised , reaf my lips is an oigihnal . | positive (p = 0.95) | neutral (p = 0.68) |
33 | if the movie succeeds in instilling a wary sense of ` there but for the grace of god , ' it is far too self-conscious to draw you deeply into its world . | if the mofvie succeeds in instilling a wary sense of ` gthere but got the grace f god , ' it is far topo self-conscious to draw ou deeply intk its world | negative (p = 0.54) | neutral (p = 0.58) |
When feature “text” is perturbed with the transformation “Punctuation Removal”, the model changes its prediction in 6.58% of the cases. We expected the predictions not to be affected by this transformation.
Level | Metric | Transformation | Deviation |
---|---|---|---|
medium 🟡 | Fail rate = 0.066 | Punctuation Removal | 57/866 tested samples (6.58%) changed prediction after perturbation |
Taxonomy
avid-effect:performance:P0201🔍✨Examples
text | Punctuation Removal(text) | Original prediction | Prediction after perturbation | |
---|---|---|---|---|
28 | it 's a cookie-cutter movie , a cut-and-paste job . | it s a cookie cutter movie a cut and paste job | neutral (p = 0.57) | negative (p = 0.72) |
52 | mr. tsai is a very original artist in his medium , and what time is it there ? | mr tsai is a very original artist in his medium and what time is it there | neutral (p = 0.53) | positive (p = 0.53) |
69 | this one is definitely one to skip , even for horror movie fanatics . | this one is definitely one to skip even for horror movie fanatics | negative (p = 0.69) | neutral (p = 0.44) |
When feature “text” is perturbed with the transformation “Accent Removal”, the model changes its prediction in 20.0% of the cases. We expected the predictions not to be affected by this transformation.
Level | Metric | Transformation | Deviation |
---|---|---|---|
major 🔴 | Fail rate = 0.200 | Accent Removal | 1/5 tested samples (20.0%) changed prediction after perturbation |
Taxonomy
avid-effect:performance:P0201🔍✨Examples
text | Accent Removal(text) | Original prediction | Prediction after perturbation | |
---|---|---|---|---|
706 | how do you spell cliché ? | how do you spell cliche ? | neutral (p = 0.50) | negative (p = 0.50) |
When feature “text” is perturbed with the transformation “Transform to title case”, the model changes its prediction in 17.78% of the cases. We expected the predictions not to be affected by this transformation.
Level | Metric | Transformation | Deviation |
---|---|---|---|
major 🔴 | Fail rate = 0.178 | Transform to title case | 155/872 tested samples (17.78%) changed prediction after perturbation |
Taxonomy
avid-effect:performance:P0201🔍✨Examples
text | Transform to title case(text) | Original prediction | Prediction after perturbation | |
---|---|---|---|---|
2 | allows us to hope that nolan is poised to embark a major career as a commercial yet inventive filmmaker . | Allows Us To Hope That Nolan Is Poised To Embark A Major Career As A Commercial Yet Inventive Filmmaker . | positive (p = 0.78) | neutral (p = 0.53) |
6 | a sometimes tedious film . | A Sometimes Tedious Film . | negative (p = 0.73) | neutral (p = 0.51) |
9 | in exactly 89 minutes , most of which passed as slowly as if i 'd been sitting naked on an igloo , formula 51 sank from quirky to jerky to utter turkey . | In Exactly 89 Minutes , Most Of Which Passed As Slowly As If I 'D Been Sitting Naked On An Igloo , Formula 51 Sank From Quirky To Jerky To Utter Turkey . | negative (p = 0.78) | neutral (p = 0.51) |
👉Robustness issues (5)
When feature “text” is perturbed with the transformation “Transform to uppercase”, the model changes its prediction in 31.31% of the cases. We expected the predictions not to be affected by this transformation.
Level | Metric | Transformation | Deviation |
---|---|---|---|
major 🔴 | Fail rate = 0.313 | Transform to uppercase | 273/872 tested samples (31.31%) changed prediction after perturbation |
Taxonomy
avid-effect:performance:P0201🔍✨Examples
text | Transform to uppercase(text) | Original prediction | Prediction after perturbation | |
---|---|---|---|---|
0 | it 's a charming and often affecting journey . | IT 'S A CHARMING AND OFTEN AFFECTING JOURNEY . | positive (p = 0.92) | neutral (p = 0.82) |
3 | the acting , costumes , music , cinematography and sound are all astounding given the production 's austere locales . | THE ACTING , COSTUMES , MUSIC , CINEMATOGRAPHY AND SOUND ARE ALL ASTOUNDING GIVEN THE PRODUCTION 'S AUSTERE LOCALES . | positive (p = 0.91) | neutral (p = 0.78) |
4 | it 's slow -- very , very slow . | IT 'S SLOW -- VERY , VERY SLOW . | negative (p = 0.76) | neutral (p = 0.70) |
👉Performance issues (1)
For records in the dataset where text
contains "film", the Precision is 14.07% lower than the global Precision.
Level | Data slice | Metric | Deviation |
---|---|---|---|
major 🔴 | text contains "film" |
Precision = 0.419 | -14.07% than global |
Taxonomy
avid-effect:performance:P0204🔍✨Examples
text | label | Predicted label |
|
---|---|---|---|
5 | although laced with humor and a few fanciful touches , the film is a refreshingly serious look at young women . | neutral | positive (p = 0.88) |
8 | you do n't have to know about music to appreciate the film 's easygoing blend of comedy and romance . | neutral | positive (p = 0.80) |
10 | the mesmerizing performances of the leads keep the film grounded and keep the audience riveted . | neutral | positive (p = 0.95) |