Hi Team,
This is a report from Giskard Bot Scan 🐢.
We have identified 16 potential vulnerabilities in your model based on an automated scan.
This automated analysis evaluated the model on the dataset tweet_eval (subset offensive
, split train
).
👉Robustness issues (3)
When feature “text” is perturbed with the transformation “Transform to uppercase”, the model changes its prediction in 12.2% of the cases. We expected the predictions not to be affected by this transformation.
Level |
Data slice |
Metric |
Deviation |
major 🔴 |
— |
Fail rate = 0.122 |
122/1000 tested samples (12.2%) changed prediction after perturbation |
Taxonomy
avid-effect:performance:P0201
🔍✨Examples
|
text |
Transform to uppercase(text) |
Original prediction |
Prediction after perturbation |
3031 |
#Liberals had some sort of mutual masturbation awards show. This is me still not caring.
@user
@user
@user
@user
|
#LIBERALS HAD SOME SORT OF MUTUAL MASTURBATION AWARDS SHOW. THIS IS ME STILL NOT CARING.
@USER
@USER
@USER
@USER
|
offensive (p = 0.64) |
non-offensive (p = 0.84) |
5121 |
@user
His name is Coco! He is a Pomeranian. He's actually old. He's is 12. The most loyal dog I have even known! |
@USER
HIS NAME IS COCO! HE IS A POMERANIAN. HE'S ACTUALLY OLD. HE'S IS 12. THE MOST LOYAL DOG I HAVE EVEN KNOWN! |
non-offensive (p = 0.80) |
offensive (p = 0.65) |
10544 |
@user
Im with you JR.She is nothing.Don t waste your time on human waste! |
@USER
IM WITH YOU JR.SHE IS NOTHING.DON T WASTE YOUR TIME ON HUMAN WASTE! |
offensive (p = 0.55) |
non-offensive (p = 0.69) |
When feature “text” is perturbed with the transformation “Add typos”, the model changes its prediction in 8.2% of the cases. We expected the predictions not to be affected by this transformation.
Level |
Data slice |
Metric |
Deviation |
medium 🟡 |
— |
Fail rate = 0.082 |
82/1000 tested samples (8.2%) changed prediction after perturbation |
Taxonomy
avid-effect:performance:P0201
🔍✨Examples
|
text |
Add typos(text) |
Original prediction |
Prediction after perturbation |
8437 |
#guncontrol itself specifically targets minorities,
@user
The entire reason gun control"" exists is pure, unadulterated racism - bigots like you wanted to keep the recently-freed slaves disarmed and subordinate. Thankfully we've grown from there. |
#guncontril itself specifically targtes minoriues,
@user
The entire reason gun control"" existw is pure, unadulterated racism - bigots like you wanted to mkeep tye recently-free dslafves disarmed an dsubordintae.. Thankfully we'be grpwn fro mthere. |
offensive (p = 0.58) |
non-offensive (p = 0.52) |
11540 |
@user
When do you post on conservatives declaring Ford a liar without evidence? |
@user
When do you post on vconservatives declaring Ford a kiar without evidence? |
offensive (p = 0.78) |
non-offensive (p = 0.78) |
147 |
@user
@user
He is such a showman! And those pipes😍 the man can SING |
@user
@user
He is such a showmsan! Xnd those piped😍 t ema ncan ZING |
offensive (p = 0.51) |
non-offensive (p = 0.86) |
When feature “text” is perturbed with the transformation “Transform to title case”, the model changes its prediction in 7.8% of the cases. We expected the predictions not to be affected by this transformation.
Level |
Data slice |
Metric |
Deviation |
medium 🟡 |
— |
Fail rate = 0.078 |
78/1000 tested samples (7.8%) changed prediction after perturbation |
Taxonomy
avid-effect:performance:P0201
🔍✨Examples
|
text |
Transform to title case(text) |
Original prediction |
Prediction after perturbation |
3031 |
#Liberals had some sort of mutual masturbation awards show. This is me still not caring.
@user
@user
@user
@user
|
#Liberals Had Some Sort Of Mutual Masturbation Awards Show. This Is Me Still Not Caring.
@User
@User
@User
@User
|
offensive (p = 0.64) |
non-offensive (p = 0.79) |
10185 |
Baaaaa baaaaa...that's the American sheeple who prove Roger Goodell right: The #NFL CAN do whatever it wants and you'll take it because you'll NEVER give up the NFL. Baaaaa baaaaa.
@user
#QAnon #MAGA #WWG1WGA #FreeAlexJones |
Baaaaa Baaaaa...That'S The American Sheeple Who Prove Roger Goodell Right: The #Nfl Can Do Whatever It Wants And You'Ll Take It Because You'Ll Never Give Up The Nfl. Baaaaa Baaaaa.
@User
#Qanon #Maga #Wwg1Wga #Freealexjones |
offensive (p = 0.62) |
non-offensive (p = 0.59) |
2192 |
@user
@user
@user
@user
@user
Only liberals can take an athlete berating an official and turn it into sexism and a sjw issue. Liberals ruin everything by putting politics into everything. |
@User
@User
@User
@User
@User
Only Liberals Can Take An Athlete Berating An Official And Turn It Into Sexism And A Sjw Issue. Liberals Ruin Everything By Putting Politics Into Everything. |
non-offensive (p = 0.52) |
offensive (p = 0.58) |
👉Performance issues (6)
For records in the dataset where text
contains "maga", the Recall is 20.67% lower than the global Recall.
Level |
Data slice |
Metric |
Deviation |
major 🔴 |
text contains "maga" |
Recall = 0.633 |
-20.67% than global |
Taxonomy
avid-effect:performance:P0204
🔍✨Examples
|
text |
label |
Predicted label |
115 |
@user
@user
@user
How about your outrage over this? Do something so you can be credible instead of being perceived as demagogues partisans #MAGA |
offensive |
non-offensive (p = 0.58) |
263 |
Sean Penn: #MeToo Movement Seeks to ‘Divide Men and Women’ <— Gotta say… I stand w/ Jeff Spicoli on this one. #MAGA #WWG1WGA #ConfirmKavanaugh |
offensive |
non-offensive (p = 0.79) |
284 |
@user
@user
Silly Joe! I’m a MAGA Of Society 💯🇺🇸❤️🇺🇸❤️🇺🇸... Feel That👊👊 |
non-offensive |
offensive (p = 0.64) |
For records in the dataset where text
contains "conservatives", the Recall is 19.71% lower than the global Recall.
Level |
Data slice |
Metric |
Deviation |
major 🔴 |
text contains "conservatives" |
Recall = 0.641 |
-19.71% than global |
Taxonomy
avid-effect:performance:P0204
🔍✨Examples
|
text |
label |
Predicted label |
10 |
@user
Antifa would burn a Conservatives house down and CNN would be there lighting the torches & throwing gas on the flames. |
offensive |
non-offensive (p = 0.58) |
98 |
@user
#Conservatives #Republicans are #TRAITORS to #America |
offensive |
non-offensive (p = 0.82) |
493 |
@user
Conservatives who quit rather than fight betray every claim that cons are patriots"" & ""will fight for our Constitution."" It reveals extremism & defeatism at once. Dems KNOW they will win if they stick together. History PROVES it."" |
offensive |
non-offensive (p = 0.78) |
For records in the dataset where text
contains "antifa", the Precision is 9.72% lower than the global Precision.
Level |
Data slice |
Metric |
Deviation |
medium 🟡 |
text contains "antifa" |
Precision = 0.716 |
-9.72% than global |
Taxonomy
avid-effect:performance:P0204
🔍✨Examples
|
text |
label |
Predicted label |
10 |
@user
Antifa would burn a Conservatives house down and CNN would be there lighting the torches & throwing gas on the flames. |
offensive |
non-offensive (p = 0.58) |
156 |
@user
A liberal anarchist coup d'etat gang runs rampant & a GOP AG does nothing. Liberals prove to be the biggest hate group in USA. The leftstream media cheers them on while excoriating duly elected President of the United States 24/7. ANTIFA openly advocates killing cops. Silence |
non-offensive |
offensive (p = 0.65) |
259 |
AntiFa fascists and Socialists are one and the same......time to respond ! |
offensive |
non-offensive (p = 0.57) |
For records in the dataset where text
contains "liberals", the Precision is 8.59% lower than the global Precision.
Level |
Data slice |
Metric |
Deviation |
medium 🟡 |
text contains "liberals" |
Precision = 0.725 |
-8.59% than global |
Taxonomy
avid-effect:performance:P0204
🔍✨Examples
|
text |
label |
Predicted label |
156 |
@user
A liberal anarchist coup d'etat gang runs rampant & a GOP AG does nothing. Liberals prove to be the biggest hate group in USA. The leftstream media cheers them on while excoriating duly elected President of the United States 24/7. ANTIFA openly advocates killing cops. Silence |
non-offensive |
offensive (p = 0.65) |
159 |
@user
@user
Why don't liberals understand that an allegation of drunk teenagers who don't have sex is not a federal crime? Golly Moses- What is wrong with those people? |
non-offensive |
offensive (p = 0.66) |
338 |
@user
@user
Of course they are. She either agrees to testify on Mon or they vote on Friday and life moves on until the next obstruction from the lunatic liberals |
non-offensive |
offensive (p = 0.56) |
For records in the dataset where text_length(text)
>= 117.500, the Recall is 6.88% lower than the global Recall.
Level |
Data slice |
Metric |
Deviation |
medium 🟡 |
text_length(text) >= 117.500 |
Recall = 0.743 |
-6.88% than global |
Taxonomy
avid-effect:performance:P0204
🔍✨Examples
For records in the dataset where text
contains "don", the Recall is 6.6% lower than the global Recall.
Level |
Data slice |
Metric |
Deviation |
medium 🟡 |
text contains "don" |
Recall = 0.745 |
-6.60% than global |
Taxonomy
avid-effect:performance:P0204
🔍✨Examples
|
text |
label |
Predicted label |
112 |
@user
@user
@user
Do it with quran and then we will see how much atheist and brave she is... hindus figured it out that this type of attention grabbing big boss wanna be people are not worth it... Go live in your 2 minutes Prime Time covered life ... Lord krishna and his followers don't care |
offensive |
non-offensive (p = 0.54) |
159 |
@user
@user
Why don't liberals understand that an allegation of drunk teenagers who don't have sex is not a federal crime? Golly Moses- What is wrong with those people? |
non-offensive |
offensive (p = 0.66) |
375 |
@user
Bye bye Flake you Flake! Don’t let the Senate door hit you in the behind on your way out.
@user
@user
@user
@user
|
offensive |
non-offensive (p = 0.52) |
👉Underconfidence issues (7)
For records in your dataset where text
contains "don", we found a significantly higher number of underconfident predictions (23 samples, corresponding to 3.20% of the predictions in the data slice).
Level |
Data slice |
Metric |
Deviation |
major 🔴 |
text contains "don" |
Underconfidence rate = 0.032 |
+89.91% than global |
Taxonomy
avid-effect:performance:P0204
🔍✨Examples
|
text |
label |
Predicted label |
520 |
@user
The Dems playing a con game. It must be Antifa don’t you think. |
offensive |
offensive (p = 0.50) |
|
|
|
non-offensive (p = 0.50) |
3441 |
@user
@user
@user
@user
@user
BS. No we don't support gun control. We support #2A that says SHALL NOT BE INFRINGED. Go read some history and find out why. You are part of the reason. |
offensive |
non-offensive (p = 0.50) |
|
|
|
offensive (p = 0.50) |
2484 |
@user
@user
@user
@user
@user
@user
You are a child. Don’t even talk to me. Lmao. |
non-offensive |
offensive (p = 0.50) |
|
|
|
non-offensive (p = 0.50) |
For records in your dataset where text
contains "liberals", we found a significantly higher number of underconfident predictions (35 samples, corresponding to 2.82% of the predictions in the data slice).
Level |
Data slice |
Metric |
Deviation |
major 🔴 |
text contains "liberals" |
Underconfidence rate = 0.028 |
+67.06% than global |
Taxonomy
avid-effect:performance:P0204
🔍✨Examples
|
text |
label |
Predicted label |
10284 |
20 Crazy Things Liberals Blame Trump For via
@user
Unhinged blame everything on our President. |
offensive |
offensive (p = 0.50) |
|
|
|
non-offensive (p = 0.50) |
8538 |
@user
@user
Tommy Robinson speaks the truth and finds himself in jail? Le Penspeaks the truth and now she needs a psychiatric evaluation? America wake up! Vote Democrat and this is your future and your kids future! Democrats/liberals support muslim extremists ! |
non-offensive |
non-offensive (p = 0.50) |
|
|
|
offensive (p = 0.50) |
8551 |
@user
Maybe we need more gun control after all - liberals should not be allowed to possess them! |
non-offensive |
non-offensive (p = 0.50) |
|
|
|
offensive (p = 0.50) |
For records in your dataset where text
contains "antifa", we found a significantly higher number of underconfident predictions (28 samples, corresponding to 2.68% of the predictions in the data slice).
Level |
Data slice |
Metric |
Deviation |
major 🔴 |
text contains "antifa" |
Underconfidence rate = 0.027 |
+59.15% than global |
Taxonomy
avid-effect:performance:P0204
🔍✨Examples
|
text |
label |
Predicted label |
520 |
@user
The Dems playing a con game. It must be Antifa don’t you think. |
offensive |
offensive (p = 0.50) |
|
|
|
non-offensive (p = 0.50) |
10981 |
@user
Why is Antifa upset w/ “prof.” Blasé-Ford? Antifa is the group that harrasses & threatens people at their homes. |
offensive |
offensive (p = 0.50) |
|
|
|
non-offensive (p = 0.50) |
5348 |
@user
Please show me your post where you said the same thing about conservatives targeted by Antifa and other leftist friends of yours 🙄 #Hypocrite |
offensive |
offensive (p = 0.50) |
|
|
|
non-offensive (p = 0.50) |
For records in your dataset where text
contains "conservatives", we found a significantly higher number of underconfident predictions (21 samples, corresponding to 2.53% of the predictions in the data slice).
Level |
Data slice |
Metric |
Deviation |
major 🔴 |
text contains "conservatives" |
Underconfidence rate = 0.025 |
+49.99% than global |
Taxonomy
avid-effect:performance:P0204
🔍✨Examples
|
text |
label |
Predicted label |
5348 |
@user
Please show me your post where you said the same thing about conservatives targeted by Antifa and other leftist friends of yours 🙄 #Hypocrite |
offensive |
offensive (p = 0.50) |
|
|
|
non-offensive (p = 0.50) |
6667 |
@user
@user
@user
Conservatives are whining"". There, I fixed the typo for you."" |
offensive |
offensive (p = 0.50) |
|
|
|
non-offensive (p = 0.50) |
11156 |
@user
The reason people are tired of this is because you only focus on the conservatives. Ford has treated people poorly and the constitutional crisis is abominable. His treatment of the women in the legislature is despicable. |
non-offensive |
non-offensive (p = 0.50) |
|
|
|
offensive (p = 0.50) |
For records in your dataset where text_length(text)
>= 79.500, we found a significantly higher number of underconfident predictions (157 samples, corresponding to 2.17% of the predictions in the data slice).
Level |
Data slice |
Metric |
Deviation |
major 🔴 |
text_length(text) >= 79.500 |
Underconfidence rate = 0.022 |
+28.88% than global |
Taxonomy
avid-effect:performance:P0204
🔍✨Examples
|
text |
text_length(text) |
label |
Predicted label |
2757 |
.
@user
tells us she is an unserious attention seeker #iphone
@user
@user
@user
|
81 |
offensive |
offensive (p = 0.50) |
|
|
|
|
non-offensive (p = 0.50) |
11638 |
@user
@user
@user
@user
@user
@user
@user
@user
@user
@user
@user
Yea cuz mushroom penis syndrome is definitely an impeachable offense 😂 #MAGA |
142 |
offensive |
non-offensive (p = 0.50) |
|
|
|
|
offensive (p = 0.50) |
10284 |
20 Crazy Things Liberals Blame Trump For via
@user
Unhinged blame everything on our President. |
95 |
offensive |
offensive (p = 0.50) |
|
|
|
|
non-offensive (p = 0.50) |
For records in your dataset where avg_word_length(text)
>= 4.009, we found a significantly higher number of underconfident predictions (188 samples, corresponding to 1.94% of the predictions in the data slice).
Level |
Data slice |
Metric |
Deviation |
medium 🟡 |
avg_word_length(text) >= 4.009 |
Underconfidence rate = 0.019 |
+15.01% than global |
Taxonomy
avid-effect:performance:P0204
🔍✨Examples
|
text |
avg_word_length(text) |
label |
Predicted label |
2757 |
.
@user
tells us she is an unserious attention seeker #iphone
@user
@user
@user
|
4.71429 |
offensive |
offensive (p = 0.50) |
|
|
|
|
non-offensive (p = 0.50) |
436 |
@user
He is reprehensible. Him and Durbin. |
5.14286 |
non-offensive |
non-offensive (p = 0.50) |
|
|
|
|
offensive (p = 0.50) |
11638 |
@user
@user
@user
@user
@user
@user
@user
@user
@user
@user
@user
Yea cuz mushroom penis syndrome is definitely an impeachable offense 😂 #MAGA |
5.21739 |
offensive |
non-offensive (p = 0.50) |
|
|
|
|
offensive (p = 0.50) |
For records in your dataset where avg_whitespace(text)
< 0.189, we found a significantly higher number of underconfident predictions (182 samples, corresponding to 1.91% of the predictions in the data slice).
Level |
Data slice |
Metric |
Deviation |
medium 🟡 |
avg_whitespace(text) < 0.189 |
Underconfidence rate = 0.019 |
+13.00% than global |
Taxonomy
avid-effect:performance:P0204
🔍✨Examples
|
text |
avg_whitespace(text) |
label |
Predicted label |
11293 |
@user
Oh my god🤮🤮🤮🤮 |
0.157895 |
non-offensive |
offensive (p = 0.50) |
|
|
|
|
non-offensive (p = 0.50) |
2757 |
.
@user
tells us she is an unserious attention seeker #iphone
@user
@user
@user
|
0.185185 |
offensive |
offensive (p = 0.50) |
|
|
|
|
non-offensive (p = 0.50) |
436 |
@user
He is reprehensible. Him and Durbin. |
0.142857 |
non-offensive |
non-offensive (p = 0.50) |
|
|
|
|
offensive (p = 0.50) |
Checkout out the Giskard Space and test your model.
Disclaimer: it's important to note that automated scans may produce false positives or miss certain vulnerabilities. We encourage you to review the findings and assess the impact accordingly.