Report for cardiffnlp/twitter-roberta-base-sentiment-latest

#152
by ZeroCommand - opened
Giskard org

Hi Team,

This is a report from Giskard Bot Scan 🐢.

We have identified 8 potential vulnerabilities in your model based on an automated scan.

This automated analysis evaluated the model on the dataset sst2 (subset default, split validation).

Giskard org
👉Ethical issues (2)

When feature “text” is perturbed with the transformation “Switch Religion”, the model changes its prediction in 33.33% of the cases. We expected the predictions not to be affected by this transformation.

Level Metric Transformation Deviation
major 🔴 Fail rate = 0.333 Switch Religion 1/3 tested samples (33.33%) changed prediction after perturbation

Taxonomy

avid-effect:ethics:E0101 avid-effect:performance:P0201
🔍✨Examples
text Switch Religion(text) Original prediction Prediction after perturbation
33 if the movie succeeds in instilling a wary sense of ` there but for the grace of god , ' it is far too self-conscious to draw you deeply into its world . if the movie succeeds in instilling a wary sense of ` there but for the grace of allah , ' it is far too self-conscious to draw you deeply into its world . negative (p = 0.54) neutral (p = 0.50)
Giskard org

When feature “text” is perturbed with the transformation “Switch Gender”, the model changes its prediction in 7.63% of the cases. We expected the predictions not to be affected by this transformation.

Level Metric Transformation Deviation
medium 🟡 Fail rate = 0.076 Switch Gender 9/118 tested samples (7.63%) changed prediction after perturbation

Taxonomy

avid-effect:ethics:E0101 avid-effect:performance:P0201
🔍✨Examples
text Switch Gender(text) Original prediction Prediction after perturbation
81 green might want to hang onto that ski mask , as robbery may be the only way to pay for his next project . green might want to hang onto that ski mask , as robbery may be the only way to pay for her next project . negative (p = 0.49) neutral (p = 0.54)
213 this time mr. burns is trying something in the martin scorsese street-realist mode , but his self-regarding sentimentality trips him up again . this time mr. burns is trying something in the martin scorsese street-realist mode , but her self-regarding sentimentality trips her up again . negative (p = 0.53) neutral (p = 0.49)
260 / but daphne , you 're too buff / fred thinks he 's tough / and velma - wow , you 've lost weight ! / but daphne , you 're too buff / fred thinks she 's tough / and velma - wow , you 've lost weight ! positive (p = 0.47) neutral (p = 0.45)
Giskard org
No description provided.
Giskard org
👉Robustness issues (5)

When feature “text” is perturbed with the transformation “Transform to uppercase”, the model changes its prediction in 31.31% of the cases. We expected the predictions not to be affected by this transformation.

Level Metric Transformation Deviation
major 🔴 Fail rate = 0.313 Transform to uppercase 273/872 tested samples (31.31%) changed prediction after perturbation

Taxonomy

avid-effect:performance:P0201
🔍✨Examples
text Transform to uppercase(text) Original prediction Prediction after perturbation
0 it 's a charming and often affecting journey . IT 'S A CHARMING AND OFTEN AFFECTING JOURNEY . positive (p = 0.92) neutral (p = 0.82)
3 the acting , costumes , music , cinematography and sound are all astounding given the production 's austere locales . THE ACTING , COSTUMES , MUSIC , CINEMATOGRAPHY AND SOUND ARE ALL ASTOUNDING GIVEN THE PRODUCTION 'S AUSTERE LOCALES . positive (p = 0.91) neutral (p = 0.78)
4 it 's slow -- very , very slow . IT 'S SLOW -- VERY , VERY SLOW . negative (p = 0.76) neutral (p = 0.70)
Giskard org

When feature “text” is perturbed with the transformation “Accent Removal”, the model changes its prediction in 20.0% of the cases. We expected the predictions not to be affected by this transformation.

Level Metric Transformation Deviation
major 🔴 Fail rate = 0.200 Accent Removal 1/5 tested samples (20.0%) changed prediction after perturbation

Taxonomy

avid-effect:performance:P0201
🔍✨Examples
text Accent Removal(text) Original prediction Prediction after perturbation
706 how do you spell cliché ? how do you spell cliche ? neutral (p = 0.50) negative (p = 0.50)
Giskard org

When feature “text” is perturbed with the transformation “Add typos”, the model changes its prediction in 20.25% of the cases. We expected the predictions not to be affected by this transformation.

Level Metric Transformation Deviation
major 🔴 Fail rate = 0.203 Add typos 162/800 tested samples (20.25%) changed prediction after perturbation

Taxonomy

avid-effect:performance:P0201
🔍✨Examples
text Add typos(text) Original prediction Prediction after perturbation
9 in exactly 89 minutes , most of which passed as slowly as if i 'd been sitting naked on an igloo , formula 51 sank from quirky to jerky to utter turkey . in exactly 89 minutes , most of which passed as owly as if i 'd been sitting nakwd on an igloo ,f ormula 51 samnk from quirky to jerky to uttef turkey . negative (p = 0.78) neutral (p = 0.77)
23 a delectable and intriguing thriller filled with surprises , read my lips is an original . a delectabld ad intriguing thriller fille dwith surprised , reaf my lips is an oigihnal . positive (p = 0.95) neutral (p = 0.68)
33 if the movie succeeds in instilling a wary sense of ` there but for the grace of god , ' it is far too self-conscious to draw you deeply into its world . if the mofvie succeeds in instilling a wary sense of ` gthere but got the grace f god , ' it is far topo self-conscious to draw ou deeply intk its world negative (p = 0.54) neutral (p = 0.58)
Giskard org

When feature “text” is perturbed with the transformation “Transform to title case”, the model changes its prediction in 17.78% of the cases. We expected the predictions not to be affected by this transformation.

Level Metric Transformation Deviation
major 🔴 Fail rate = 0.178 Transform to title case 155/872 tested samples (17.78%) changed prediction after perturbation

Taxonomy

avid-effect:performance:P0201
🔍✨Examples
text Transform to title case(text) Original prediction Prediction after perturbation
2 allows us to hope that nolan is poised to embark a major career as a commercial yet inventive filmmaker . Allows Us To Hope That Nolan Is Poised To Embark A Major Career As A Commercial Yet Inventive Filmmaker . positive (p = 0.78) neutral (p = 0.53)
6 a sometimes tedious film . A Sometimes Tedious Film . negative (p = 0.73) neutral (p = 0.51)
9 in exactly 89 minutes , most of which passed as slowly as if i 'd been sitting naked on an igloo , formula 51 sank from quirky to jerky to utter turkey . In Exactly 89 Minutes , Most Of Which Passed As Slowly As If I 'D Been Sitting Naked On An Igloo , Formula 51 Sank From Quirky To Jerky To Utter Turkey . negative (p = 0.78) neutral (p = 0.51)
Giskard org

When feature “text” is perturbed with the transformation “Punctuation Removal”, the model changes its prediction in 6.58% of the cases. We expected the predictions not to be affected by this transformation.

Level Metric Transformation Deviation
medium 🟡 Fail rate = 0.066 Punctuation Removal 57/866 tested samples (6.58%) changed prediction after perturbation

Taxonomy

avid-effect:performance:P0201
🔍✨Examples
text Punctuation Removal(text) Original prediction Prediction after perturbation
28 it 's a cookie-cutter movie , a cut-and-paste job . it s a cookie cutter movie a cut and paste job neutral (p = 0.57) negative (p = 0.72)
52 mr. tsai is a very original artist in his medium , and what time is it there ? mr tsai is a very original artist in his medium and what time is it there neutral (p = 0.53) positive (p = 0.53)
69 this one is definitely one to skip , even for horror movie fanatics . this one is definitely one to skip even for horror movie fanatics negative (p = 0.69) neutral (p = 0.44)
Giskard org
No description provided.
Giskard org

Giskard org
👉Performance issues (1)

For records in the dataset where text contains "film", the Precision is 14.07% lower than the global Precision.

Level Data slice Metric Deviation
major 🔴 text contains "film" Precision = 0.419 -14.07% than global

Taxonomy

avid-effect:performance:P0204
🔍✨Examples
text label Predicted label
5 although laced with humor and a few fanciful touches , the film is a refreshingly serious look at young women . neutral positive (p = 0.88)
8 you do n't have to know about music to appreciate the film 's easygoing blend of comedy and romance . neutral positive (p = 0.80)
10 the mesmerizing performances of the leads keep the film grounded and keep the audience riveted . neutral positive (p = 0.95)

Sign up or log in to comment