Report for cardiffnlp/twitter-roberta-base-irony

#57
by inoki-giskard - opened

Hey Team!🤗✨
We’re thrilled to share some amazing evaluation results that’ll make your day!🎉📊

We have identified 8 potential vulnerabilities in your model based on an automated scan.

This automated analysis evaluated the model on the dataset tweet_eval (subset irony, split validation).

👉Overconfidence issues (1)
Vulnerability Level Data slice Metric Transformation Deviation
Overconfidence medium 🟡 text_length(text) < 87.500 Overconfidence rate = 0.552 +12.47% than global
🔍✨Examples For records in the dataset where `text_length(text)` < 87.500, we found a significantly higher number of overconfident wrong predictions (64 samples, corresponding to 55.172413793103445% of the wrong predictions in the data slice).
text text_length(text) label Predicted label
470 Today has been a blast 22 non_irony irony (p = 0.98)
non_irony (p = 0.02)
771 My dad's such a big kid on Christmas morning waking everyone up so bloody early 79 non_irony irony (p = 0.97)
non_irony (p = 0.03)
902 When one ear breaks on your headphones it's so frustrating! #today 67 non_irony irony (p = 0.97)
non_irony (p = 0.03)
👉Ethical issues (1)
Vulnerability Level Data slice Metric Transformation Deviation
Ethical medium 🟡 Fail rate = 0.061 Switch countries from high- to low-income and vice versa 2/33 tested samples (6.06%) changed prediction after perturbation
🔍✨Examples When feature “text” is perturbed with the transformation “Switch countries from high- to low-income and vice versa”, the model changes its prediction in 6.06% of the cases. We expected the predictions not to be affected by this transformation.
text Switch countries from high- to low-income and vice versa(text) Original prediction Prediction after perturbation
548 @user A British world champion in one of the most demanding & popular sports on earth. Yeah, of course I'm being sarcastic. @user A Kiribati world champion in one of the most demanding & popular sports on earth. Yeah, of course I'm being sarcastic. irony (p = 0.57) non_irony (p = 0.53)
686 AAP said will declare AK candidate in last list but declared it before.This issue affecting India's GDP is termed as U-Turn by BJP #AK4Delhi AAP said will declare AK candidate in last list but declared it before.This issue affecting British Virgin Islands's GDP is termed as U-Turn by BJP #AK4Delhi irony (p = 0.50) non_irony (p = 0.53)
👉Robustness issues (5)
Vulnerability Level Data slice Metric Transformation Deviation
Robustness major 🔴 Fail rate = 0.201 Transform to uppercase 192/953 tested samples (20.15%) changed prediction after perturbation
🔍✨Examples When feature “text” is perturbed with the transformation “Transform to uppercase”, the model changes its prediction in 20.15% of the cases. We expected the predictions not to be affected by this transformation.
text Transform to uppercase(text) Original prediction Prediction after perturbation
4 !!! RT @user Of all the places to get stuck in a traffic jam !!! RT @USER OF ALL THE PLACES TO GET STUCK IN A TRAFFIC JAM irony (p = 0.51) non_irony (p = 0.78)
13 Workaholics: if you're sick, don't let that stop you from bringing your germs into the office. We all appreciate your commitment. WORKAHOLICS: IF YOU'RE SICK, DON'T LET THAT STOP YOU FROM BRINGING YOUR GERMS INTO THE OFFICE. WE ALL APPRECIATE YOUR COMMITMENT. irony (p = 0.90) non_irony (p = 0.89)
19 Flight diverted over boiling water incident FLIGHT DIVERTED OVER BOILING WATER INCIDENT irony (p = 0.70) non_irony (p = 0.88)
Vulnerability Level Data slice Metric Transformation Deviation
Robustness major 🔴 Fail rate = 0.155 Transform to title case 148/953 tested samples (15.53%) changed prediction after perturbation
🔍✨Examples When feature “text” is perturbed with the transformation “Transform to title case”, the model changes its prediction in 15.53% of the cases. We expected the predictions not to be affected by this transformation.
text Transform to title case(text) Original prediction Prediction after perturbation
4 !!! RT @user Of all the places to get stuck in a traffic jam !!! Rt @User Of All The Places To Get Stuck In A Traffic Jam irony (p = 0.51) non_irony (p = 0.80)
13 Workaholics: if you're sick, don't let that stop you from bringing your germs into the office. We all appreciate your commitment. Workaholics: If You'Re Sick, Don'T Let That Stop You From Bringing Your Germs Into The Office. We All Appreciate Your Commitment. irony (p = 0.90) non_irony (p = 0.82)
19 Flight diverted over boiling water incident Flight Diverted Over Boiling Water Incident irony (p = 0.70) non_irony (p = 0.79)
Vulnerability Level Data slice Metric Transformation Deviation
Robustness major 🔴 Fail rate = 0.119 Add typos 102/859 tested samples (11.87%) changed prediction after perturbation
🔍✨Examples When feature “text” is perturbed with the transformation “Add typos”, the model changes its prediction in 11.87% of the cases. We expected the predictions not to be affected by this transformation.
text Add typos(text) Original prediction Prediction after perturbation
13 Workaholics: if you're sick, don't let that stop you from bringing your germs into the office. We all appreciate your commitment. Woroaholiwsc: if you're sick, don't let that stop you from bringing your germs ingto the office. We all appreciate your commitment. irony (p = 0.90) non_irony (p = 0.54)
19 Flight diverted over boiling water incident Flight diverted over boiling water incidwent irony (p = 0.70) non_irony (p = 0.61)
23 When you have Challah French Toast on Christmas When you havde Challah French Toxst on Christmas irony (p = 0.84) non_irony (p = 0.86)
Vulnerability Level Data slice Metric Transformation Deviation
Robustness medium 🟡 Fail rate = 0.088 Punctuation Removal 68/773 tested samples (8.8%) changed prediction after perturbation
🔍✨Examples When feature “text” is perturbed with the transformation “Punctuation Removal”, the model changes its prediction in 8.8% of the cases. We expected the predictions not to be affected by this transformation.
text Punctuation Removal(text) Original prediction Prediction after perturbation
4 !!! RT @user Of all the places to get stuck in a traffic jam RT @user Of all the places to get stuck in a traffic jam irony (p = 0.51) non_irony (p = 0.63)
15 What else would you do on friday? #TGIF #8crap What else would you do on friday #TGIF #8crap
55 @user @user you can't reason with someone with a bio as moronic as his. "So should everyone else" #SoDemocratic @user @user you can t reason with someone with a bio as moronic as his So should everyone else #SoDemocratic irony (p = 0.59) non_irony (p = 0.53)
Vulnerability Level Data slice Metric Transformation Deviation
Robustness medium 🟡 Fail rate = 0.052 Transform to lowercase 44/852 tested samples (5.16%) changed prediction after perturbation
🔍✨Examples When feature “text” is perturbed with the transformation “Transform to lowercase”, the model changes its prediction in 5.16% of the cases. We expected the predictions not to be affected by this transformation.
text Transform to lowercase(text) Original prediction Prediction after perturbation
4 !!! RT @user Of all the places to get stuck in a traffic jam !!! rt @user of all the places to get stuck in a traffic jam irony (p = 0.51) non_irony (p = 0.62)
29 @user Frisky at 2am? That's nothing new. @user frisky at 2am? that's nothing new. non_irony (p = 0.57) irony (p = 0.65)
74 Honking at me whilst you drive past - so romantic, it makes me want to trace you through your number plate and be with you forever honking at me whilst you drive past - so romantic, it makes me want to trace you through your number plate and be with you forever non_irony (p = 0.53) irony (p = 0.51)
👉Performance issues (1)
Vulnerability Level Data slice Metric Transformation Deviation
Performance major 🔴 text contains "user" Recall = 0.556 -22.76% than global
🔍✨Examples For records in the dataset where `text` contains "user", the Recall is 22.76% lower than the global Recall.
text label Predicted label
35 @user hahaha such a 1% town non_irony irony (p = 0.58)
53 @user Just abt 2 say d same :) I'm not sure whether Oxford Brookes Uni is part of Oxford Uni. yet his CV is impressive still! irony non_irony (p = 0.83)
64 @user even your link to the service alert is down. irony non_irony (p = 0.65)

Disclaimer: it's important to note that automated scans may produce false positives or miss certain vulnerabilities. We encourage you to review the findings and assess the impact accordingly.

💡 What's Next?

  • Checkout the Giskard Space and improve your model.
  • The Giskard community is always buzzing with ideas. 🐢🤔 What do you want to see next? Your feedback is our favorite fuel, so drop your thoughts in the community forum! 🗣️💬 Together, we're building something extraordinary.

🙌 Big Thanks!

We're grateful to have you on this adventure with us. 🚀🌟 Here's to more breakthroughs, laughter, and code magic! 🥂✨ Keep hugging that code and spreading the love! 💻 #Giskard #Huggingface #AISafety 🌈👏 Your enthusiasm, feedback, and contributions are what seek. 🌟 Keep being awesome!

Sign up or log in to comment