Hi Team,
This is a report from Giskard Bot Scan 🐢.
We have identified 8 potential vulnerabilities in your model based on an automated scan.
This automated analysis evaluated the model on the dataset cardiffnlp/tweet_sentiment_multilingual (subset english
, split test
).
👉Robustness issues (6)
When feature “text” is perturbed with the transformation “Transform to uppercase”, the model changes its prediction in 20.67% of the cases. We expected the predictions not to be affected by this transformation.
Level |
Metric |
Transformation |
Deviation |
major 🔴 |
Fail rate = 0.207 |
Transform to uppercase |
179/866 tested samples (20.67%) changed prediction after perturbation |
Taxonomy
avid-effect:performance:P0201
🔍✨Examples
|
text |
Transform to uppercase(text) |
Original prediction |
Prediction after perturbation |
0 |
Trying to have a conversation with my dad about vegetarianism is the most pointless infuriating thing ever #caveman |
TRYING TO HAVE A CONVERSATION WITH MY DAD ABOUT VEGETARIANISM IS THE MOST POINTLESS INFURIATING THING EVER #CAVEMAN |
negative (p = 0.98) |
positive (p = 0.98) |
8 |
Samsung to Bring Android 7.0 Nougat to Galaxy S6, S6 edge, Note 5, and Tab S2 - Softpedia News |
SAMSUNG TO BRING ANDROID 7.0 NOUGAT TO GALAXY S6, S6 EDGE, NOTE 5, AND TAB S2 - SOFTPEDIA NEWS |
positive (p = 0.67) |
neutral (p = 0.97) |
16 |
Git 'em girls #BarackObama #blm #blacklivesmatter #mylifematters #therealskinnysuge #thepeopleschamp #skinnyup #pmd… |
GIT 'EM GIRLS #BARACKOBAMA #BLM #BLACKLIVESMATTER #MYLIFEMATTERS #THEREALSKINNYSUGE #THEPEOPLESCHAMP #SKINNYUP #PMD… |
negative (p = 0.91) |
positive (p = 0.67) |
When feature “text” is perturbed with the transformation “Transform to title case”, the model changes its prediction in 15.31% of the cases. We expected the predictions not to be affected by this transformation.
Level |
Metric |
Transformation |
Deviation |
major 🔴 |
Fail rate = 0.153 |
Transform to title case |
132/862 tested samples (15.31%) changed prediction after perturbation |
Taxonomy
avid-effect:performance:P0201
🔍✨Examples
|
text |
Transform to title case(text) |
Original prediction |
Prediction after perturbation |
4 |
perfect pussy clips #vanessa hudgens zac efron naked |
Perfect Pussy Clips #Vanessa Hudgens Zac Efron Naked |
positive (p = 0.52) |
neutral (p = 0.91) |
25 |
@user
top candidate for NL Cy Young inevitably... |
@User
Top Candidate For Nl Cy Young Inevitably... |
positive (p = 0.63) |
neutral (p = 0.75) |
31 |
cause like euthanasia get it? lmaooo |
Cause Like Euthanasia Get It? Lmaooo |
negative (p = 0.55) |
neutral (p = 0.76) |
When feature “text” is perturbed with the transformation “Add typos”, the model changes its prediction in 13.45% of the cases. We expected the predictions not to be affected by this transformation.
Level |
Metric |
Transformation |
Deviation |
major 🔴 |
Fail rate = 0.134 |
Add typos |
110/818 tested samples (13.45%) changed prediction after perturbation |
Taxonomy
avid-effect:performance:P0201
🔍✨Examples
|
text |
Add typos(text) |
Original prediction |
Prediction after perturbation |
2 |
@user
You are a stand up guy and a Gentleman Vice President Pence |
@user
You are stand up guy anr a Genteman Vice Pesident Pence |
positive (p = 0.95) |
neutral (p = 0.61) |
8 |
Samsung to Bring Android 7.0 Nougat to Galaxy S6, S6 edge, Note 5, and Tab S2 - Softpedia News |
Samsung to Bring Android 7.0 Nougat to Galax S6, 6S edge, Note 5, and Tab S2 - Softpedia News |
positive (p = 0.67) |
neutral (p = 0.50) |
25 |
@user
top candidate for NL Cy Young inevitably... |
@uset top candidate for NL Cy Toung inevitaly... |
positive (p = 0.63) |
neutral (p = 0.86) |
When feature “text” is perturbed with the transformation “Transform to lowercase”, the model changes its prediction in 9.58% of the cases. We expected the predictions not to be affected by this transformation.
Level |
Metric |
Transformation |
Deviation |
medium 🟡 |
Fail rate = 0.096 |
Transform to lowercase |
79/825 tested samples (9.58%) changed prediction after perturbation |
Taxonomy
avid-effect:performance:P0201
🔍✨Examples
|
text |
Transform to lowercase(text) |
Original prediction |
Prediction after perturbation |
1 |
#latestnews 4 #newmexico #politics + #nativeamerican + #Israel + #Palestine - Protesting Rise Of Alt-Right At... |
#latestnews 4 #newmexico #politics + #nativeamerican + #israel + #palestine - protesting rise of alt-right at... |
neutral (p = 0.97) |
negative (p = 0.57) |
35 |
Listen to #NBAwards Winner
@user
interview on
@user
|
listen to #nbawards winner
@user
interview on
@user
|
neutral (p = 0.72) |
positive (p = 0.61) |
40 |
@user
She will be hearing my voice on her hesitation to back HRC. I am a MA voter.
@user
@user
@user
|
@user
she will be hearing my voice on her hesitation to back hrc. i am a ma voter.
@user
@user
@user
|
neutral (p = 0.51) |
positive (p = 0.60) |
When feature “text” is perturbed with the transformation “Punctuation Removal”, the model changes its prediction in 7.72% of the cases. We expected the predictions not to be affected by this transformation.
Level |
Metric |
Transformation |
Deviation |
medium 🟡 |
Fail rate = 0.077 |
Punctuation Removal |
58/751 tested samples (7.72%) changed prediction after perturbation |
Taxonomy
avid-effect:performance:P0201
🔍✨Examples
|
text |
Punctuation Removal(text) |
Original prediction |
Prediction after perturbation |
8 |
Samsung to Bring Android 7.0 Nougat to Galaxy S6, S6 edge, Note 5, and Tab S2 - Softpedia News |
Samsung to Bring Android 7 0 Nougat to Galaxy S6 S6 edge Note 5 and Tab S2 Softpedia News |
positive (p = 0.67) |
neutral (p = 0.59) |
115 |
@user
Nah, she's cool. Repeal won't kick in before summer, when she's old enough for Medicare. Oh, wait... |
@user
Nah she s cool Repeal won t kick in before summer when she s old enough for Medicare Oh wait |
positive (p = 0.76) |
neutral (p = 0.56) |
124 |
The gov't quietly just approved this enormous oil pipeline #NoDakotaAccess #NoDAPL |
The gov t quietly just approved this enormous oil pipeline #NoDakotaAccess #NoDAPL |
neutral (p = 0.38) |
positive (p = 0.40) |
When feature “text” is perturbed with the transformation “Transform numbers to words”, the model changes its prediction in 5.04% of the cases. We expected the predictions not to be affected by this transformation.
Level |
Metric |
Transformation |
Deviation |
medium 🟡 |
Fail rate = 0.050 |
Transform numbers to words |
6/119 tested samples (5.04%) changed prediction after perturbation |
Taxonomy
avid-effect:performance:P0201
🔍✨Examples
|
text |
Transform numbers to words(text) |
Original prediction |
Prediction after perturbation |
8 |
Samsung to Bring Android 7.0 Nougat to Galaxy S6, S6 edge, Note 5, and Tab S2 - Softpedia News |
Samsung to Bring Android seven Nougat to Galaxy S6, S6 edge, Note five, and Tab S2 - Softpedia News |
positive (p = 0.67) |
neutral (p = 0.63) |
136 |
Pete BurnsDavid BowieLeonard CohenAlexis ArquetteGene WilderChristina GrimmiePrinceChynaBig AngFlo Henderson etc.RIP 2016 |
Pete BurnsDavid BowieLeonard CohenAlexis ArquetteGene WilderChristina GrimmiePrinceChynaBig AngFlo Henderson etc.RIP zweitausendsechzehn |
neutral (p = 0.92) |
negative (p = 0.57) |
163 |
129 days until the release of Persona 5 |
one hundred and twenty-nine days until the release of Persona five |
positive (p = 0.63) |
neutral (p = 0.67) |
👉Ethical issues (2)
When feature “text” is perturbed with the transformation “Switch countries from high- to low-income and vice versa”, the model changes its prediction in 6.74% of the cases. We expected the predictions not to be affected by this transformation.
Level |
Metric |
Transformation |
Deviation |
medium 🟡 |
Fail rate = 0.067 |
Switch countries from high- to low-income and vice versa |
6/89 tested samples (6.74%) changed prediction after perturbation |
Taxonomy
avid-effect:ethics:E0101
avid-effect:performance:P0201
🔍✨Examples
|
text |
Switch countries from high- to low-income and vice versa(text) |
Original prediction |
Prediction after perturbation |
85 |
#Syria #Hezbollah Nasrallah's bodyguard identified in #Aleppo |
#Singapore #Hezbollah Nasrallah's bodyguard identified in #Aleppo |
negative (p = 0.64) |
neutral (p = 0.52) |
601 |
The UK Doctor Who Beat The British GMC By Proving That Vaccines Aren’t Necessary To Achieve Health… |
The Madagascar Doctor Who Beat The Chadian GMC By Proving That Vaccines Aren’t Necessary To Achieve Health… |
negative (p = 0.90) |
neutral (p = 0.75) |
721 |
@user
#Venezuelan 😷President Nicolas Maduro called Cuban Raul, expresses solidarity with Cuban ppl following death of #FidelCastro. |
@user
#Venezuelan 😷President Nicolas Maduro called Papua New Guinean Raul, expresses solidarity with Papua New Guinean ppl following death of #FidelCastro. |
positive (p = 0.50) |
neutral (p = 0.56) |
When feature “text” is perturbed with the transformation “Switch Religion”, the model changes its prediction in 6.25% of the cases. We expected the predictions not to be affected by this transformation.
Level |
Metric |
Transformation |
Deviation |
medium 🟡 |
Fail rate = 0.062 |
Switch Religion |
2/32 tested samples (6.25%) changed prediction after perturbation |
Taxonomy
avid-effect:ethics:E0101
avid-effect:performance:P0201
🔍✨Examples
|
text |
Switch Religion(text) |
Original prediction |
Prediction after perturbation |
148 |
Discussing Catholic Faith and Pope Francis Live On Radio... #catholic |
Discussing Catholic Faith and imam Francis Live On Radio... #catholic |
positive (p = 0.50) |
neutral (p = 0.64) |
198 |
Not sure I can take anymore. Brexit, Trump and now no more Casey and Jessica has left Eric. God is life worth living ? Tesla model S,o YES. |
Not sure I can take anymore. Brexit, Trump and now no more Casey and Jessica has left Eric. allah is life worth living ? Tesla model S,o YES. |
positive (p = 0.69) |
negative (p = 0.40) |
Checkout out the Giskard Space and Giskard Documentation to learn more about how to test your model.
Disclaimer: it's important to note that automated scans may produce false positives or miss certain vulnerabilities. We encourage you to review the findings and assess the impact accordingly.