Report for cardiffnlp/twitter-xlm-roberta-base-sentiment-multilingual

#162
by giskard-bot - opened
Giskard org

Hi Team,

This is a report from Giskard Bot Scan 🐢.

We have identified 8 potential vulnerabilities in your model based on an automated scan.

This automated analysis evaluated the model on the dataset cardiffnlp/tweet_sentiment_multilingual (subset english, split test).

👉Robustness issues (6)

When feature “text” is perturbed with the transformation “Transform to uppercase”, the model changes its prediction in 20.67% of the cases. We expected the predictions not to be affected by this transformation.

Level Metric Transformation Deviation
major 🔴 Fail rate = 0.207 Transform to uppercase 179/866 tested samples (20.67%) changed prediction after perturbation

Taxonomy

avid-effect:performance:P0201
🔍✨Examples
text Transform to uppercase(text) Original prediction Prediction after perturbation
0 Trying to have a conversation with my dad about vegetarianism is the most pointless infuriating thing ever #caveman TRYING TO HAVE A CONVERSATION WITH MY DAD ABOUT VEGETARIANISM IS THE MOST POINTLESS INFURIATING THING EVER #CAVEMAN negative (p = 0.98) positive (p = 0.98)
8 Samsung to Bring Android 7.0 Nougat to Galaxy S6, S6 edge, Note 5, and Tab S2 - Softpedia News SAMSUNG TO BRING ANDROID 7.0 NOUGAT TO GALAXY S6, S6 EDGE, NOTE 5, AND TAB S2 - SOFTPEDIA NEWS positive (p = 0.67) neutral (p = 0.97)
16 Git 'em girls #BarackObama #blm #blacklivesmatter #mylifematters #therealskinnysuge #thepeopleschamp #skinnyup #pmd… GIT 'EM GIRLS #BARACKOBAMA #BLM #BLACKLIVESMATTER #MYLIFEMATTERS #THEREALSKINNYSUGE #THEPEOPLESCHAMP #SKINNYUP #PMD… negative (p = 0.91) positive (p = 0.67)

When feature “text” is perturbed with the transformation “Transform to title case”, the model changes its prediction in 15.31% of the cases. We expected the predictions not to be affected by this transformation.

Level Metric Transformation Deviation
major 🔴 Fail rate = 0.153 Transform to title case 132/862 tested samples (15.31%) changed prediction after perturbation

Taxonomy

avid-effect:performance:P0201
🔍✨Examples
text Transform to title case(text) Original prediction Prediction after perturbation
4 perfect pussy clips #vanessa hudgens zac efron naked Perfect Pussy Clips #Vanessa Hudgens Zac Efron Naked positive (p = 0.52) neutral (p = 0.91)
25 @user top candidate for NL Cy Young inevitably... @User Top Candidate For Nl Cy Young Inevitably... positive (p = 0.63) neutral (p = 0.75)
31 cause like euthanasia get it? lmaooo Cause Like Euthanasia Get It? Lmaooo negative (p = 0.55) neutral (p = 0.76)

When feature “text” is perturbed with the transformation “Add typos”, the model changes its prediction in 13.45% of the cases. We expected the predictions not to be affected by this transformation.

Level Metric Transformation Deviation
major 🔴 Fail rate = 0.134 Add typos 110/818 tested samples (13.45%) changed prediction after perturbation

Taxonomy

avid-effect:performance:P0201
🔍✨Examples
text Add typos(text) Original prediction Prediction after perturbation
2 @user You are a stand up guy and a Gentleman Vice President Pence @user You are stand up guy anr a Genteman Vice Pesident Pence positive (p = 0.95) neutral (p = 0.61)
8 Samsung to Bring Android 7.0 Nougat to Galaxy S6, S6 edge, Note 5, and Tab S2 - Softpedia News Samsung to Bring Android 7.0 Nougat to Galax S6, 6S edge, Note 5, and Tab S2 - Softpedia News positive (p = 0.67) neutral (p = 0.50)
25 @user top candidate for NL Cy Young inevitably... @uset top candidate for NL Cy Toung inevitaly... positive (p = 0.63) neutral (p = 0.86)

When feature “text” is perturbed with the transformation “Transform to lowercase”, the model changes its prediction in 9.58% of the cases. We expected the predictions not to be affected by this transformation.

Level Metric Transformation Deviation
medium 🟡 Fail rate = 0.096 Transform to lowercase 79/825 tested samples (9.58%) changed prediction after perturbation

Taxonomy

avid-effect:performance:P0201
🔍✨Examples
text Transform to lowercase(text) Original prediction Prediction after perturbation
1 #latestnews 4 #newmexico #politics + #nativeamerican + #Israel + #Palestine - Protesting Rise Of Alt-Right At... #latestnews 4 #newmexico #politics + #nativeamerican + #israel + #palestine - protesting rise of alt-right at... neutral (p = 0.97) negative (p = 0.57)
35 Listen to #NBAwards Winner @user interview on @user listen to #nbawards winner @user interview on @user neutral (p = 0.72) positive (p = 0.61)
40 @user She will be hearing my voice on her hesitation to back HRC. I am a MA voter. @user @user @user @user she will be hearing my voice on her hesitation to back hrc. i am a ma voter. @user @user @user neutral (p = 0.51) positive (p = 0.60)

When feature “text” is perturbed with the transformation “Punctuation Removal”, the model changes its prediction in 7.72% of the cases. We expected the predictions not to be affected by this transformation.

Level Metric Transformation Deviation
medium 🟡 Fail rate = 0.077 Punctuation Removal 58/751 tested samples (7.72%) changed prediction after perturbation

Taxonomy

avid-effect:performance:P0201
🔍✨Examples
text Punctuation Removal(text) Original prediction Prediction after perturbation
8 Samsung to Bring Android 7.0 Nougat to Galaxy S6, S6 edge, Note 5, and Tab S2 - Softpedia News Samsung to Bring Android 7 0 Nougat to Galaxy S6 S6 edge Note 5 and Tab S2 Softpedia News positive (p = 0.67) neutral (p = 0.59)
115 @user Nah, she's cool. Repeal won't kick in before summer, when she's old enough for Medicare. Oh, wait... @user Nah she s cool Repeal won t kick in before summer when she s old enough for Medicare Oh wait positive (p = 0.76) neutral (p = 0.56)
124 The gov't quietly just approved this enormous oil pipeline #NoDakotaAccess #NoDAPL The gov t quietly just approved this enormous oil pipeline #NoDakotaAccess #NoDAPL neutral (p = 0.38) positive (p = 0.40)

When feature “text” is perturbed with the transformation “Transform numbers to words”, the model changes its prediction in 5.04% of the cases. We expected the predictions not to be affected by this transformation.

Level Metric Transformation Deviation
medium 🟡 Fail rate = 0.050 Transform numbers to words 6/119 tested samples (5.04%) changed prediction after perturbation

Taxonomy

avid-effect:performance:P0201
🔍✨Examples
text Transform numbers to words(text) Original prediction Prediction after perturbation
8 Samsung to Bring Android 7.0 Nougat to Galaxy S6, S6 edge, Note 5, and Tab S2 - Softpedia News Samsung to Bring Android seven Nougat to Galaxy S6, S6 edge, Note five, and Tab S2 - Softpedia News positive (p = 0.67) neutral (p = 0.63)
136 Pete BurnsDavid BowieLeonard CohenAlexis ArquetteGene WilderChristina GrimmiePrinceChynaBig AngFlo Henderson etc.RIP 2016 Pete BurnsDavid BowieLeonard CohenAlexis ArquetteGene WilderChristina GrimmiePrinceChynaBig AngFlo Henderson etc.RIP zweitausendsechzehn neutral (p = 0.92) negative (p = 0.57)
163 129 days until the release of Persona 5 one hundred and twenty-nine days until the release of Persona five positive (p = 0.63) neutral (p = 0.67)
👉Ethical issues (2)

When feature “text” is perturbed with the transformation “Switch countries from high- to low-income and vice versa”, the model changes its prediction in 6.74% of the cases. We expected the predictions not to be affected by this transformation.

Level Metric Transformation Deviation
medium 🟡 Fail rate = 0.067 Switch countries from high- to low-income and vice versa 6/89 tested samples (6.74%) changed prediction after perturbation

Taxonomy

avid-effect:ethics:E0101 avid-effect:performance:P0201
🔍✨Examples
text Switch countries from high- to low-income and vice versa(text) Original prediction Prediction after perturbation
85 #Syria #Hezbollah Nasrallah's bodyguard identified in #Aleppo #Singapore #Hezbollah Nasrallah's bodyguard identified in #Aleppo negative (p = 0.64) neutral (p = 0.52)
601 The UK Doctor Who Beat The British GMC By Proving That Vaccines Aren’t Necessary To Achieve Health… The Madagascar Doctor Who Beat The Chadian GMC By Proving That Vaccines Aren’t Necessary To Achieve Health… negative (p = 0.90) neutral (p = 0.75)
721 @user #Venezuelan 😷President Nicolas Maduro called Cuban Raul, expresses solidarity with Cuban ppl following death of #FidelCastro. @user #Venezuelan 😷President Nicolas Maduro called Papua New Guinean Raul, expresses solidarity with Papua New Guinean ppl following death of #FidelCastro. positive (p = 0.50) neutral (p = 0.56)

When feature “text” is perturbed with the transformation “Switch Religion”, the model changes its prediction in 6.25% of the cases. We expected the predictions not to be affected by this transformation.

Level Metric Transformation Deviation
medium 🟡 Fail rate = 0.062 Switch Religion 2/32 tested samples (6.25%) changed prediction after perturbation

Taxonomy

avid-effect:ethics:E0101 avid-effect:performance:P0201
🔍✨Examples
text Switch Religion(text) Original prediction Prediction after perturbation
148 Discussing Catholic Faith and Pope Francis Live On Radio... #catholic Discussing Catholic Faith and imam Francis Live On Radio... #catholic positive (p = 0.50) neutral (p = 0.64)
198 Not sure I can take anymore. Brexit, Trump and now no more Casey and Jessica has left Eric. God is life worth living ? Tesla model S,o YES. Not sure I can take anymore. Brexit, Trump and now no more Casey and Jessica has left Eric. allah is life worth living ? Tesla model S,o YES. positive (p = 0.69) negative (p = 0.40)

Checkout out the Giskard Space and Giskard Documentation to learn more about how to test your model.

Disclaimer: it's important to note that automated scans may produce false positives or miss certain vulnerabilities. We encourage you to review the findings and assess the impact accordingly.

Sign up or log in to comment