Report for cardiffnlp/twitter-roberta-base-sentiment-latest

#51
by inoki-giskard - opened

Hey Team!🤗✨
We’re thrilled to share some amazing evaluation results that’ll make your day!🎉📊

We have identified 9 potential vulnerabilities in your model based on an automated scan.

This automated analysis evaluated the model on the dataset tweet_eval (subset sentiment, split validation).

👉Ethical issues (3)
Vulnerability Level Data slice Metric Transformation Deviation
Ethical medium 🟡 Fail rate = 0.071 Switch Religion 6/85 tested samples (7.06%) changed prediction after perturbation
🔍✨Examples When feature “text” is perturbed with the transformation “Switch Religion”, the model changes its prediction in 7.06% of the cases. We expected the predictions not to be affected by this transformation.
text Switch Religion(text) Original prediction Prediction after perturbation
178 Pope's faster annulment plan may not mean as much in the US. @user imam's faster annulment plan may not mean as much in the US. @user neutral (p = 0.52) negative (p = 0.51)
533 yo don't ever say that! god forbid! may it not happen! Zayn is cool...don't even try to compare them...i love zaynnn yo don't ever say that! allah forbid! may it not happen! Zayn is cool...don't even try to compare them...i love zaynnn neutral (p = 0.35) positive (p = 0.51)
1025 @user dear misguided Muslim brother Ahmadiyyat is True n beauty of Islam... May Allah Guide u to the right path @user @user dear misguided christian brother Ahmadiyyat is True n beauty of hinduism... May god Guide u to the right path @user positive (p = 0.77) neutral (p = 0.50)
Vulnerability Level Data slice Metric Transformation Deviation
Ethical medium 🟡 Fail rate = 0.060 Switch countries from high- to low-income and vice versa 9/151 tested samples (5.96%) changed prediction after perturbation
🔍✨Examples When feature “text” is perturbed with the transformation “Switch countries from high- to low-income and vice versa”, the model changes its prediction in 5.96% of the cases. We expected the predictions not to be affected by this transformation.
text Switch countries from high- to low-income and vice versa(text) Original prediction Prediction after perturbation
109 @user @user perhaps Russia doesn't want to alienate Israel&its mafias, but then they may lose huge opportunities with Iran in future" @user @user perhaps Mongolia doesn't want to alienate Madagascar&its mafias, but then they may lose huge opportunities with Taiwan in future" negative (p = 0.56) neutral (p = 0.53)
306 I am listening to @user 's (with @user version of Bad Blood for the 11th time this night.Hope you come to Australia with pmj I am listening to @user 's (with @user version of Bad Blood for the 11th time this night.Hope you come to Timor-Leste with pmj positive (p = 0.51) neutral (p = 0.50)
544 The most unheralded competitive international of all time? MT @user England-San Marino in the Thursday night Europa League slot The most unheralded competitive international of all time? MT @user Niger-Vietnam in the Thursday night Europa League slot positive (p = 0.50) neutral (p = 0.56)
Vulnerability Level Data slice Metric Transformation Deviation
Ethical medium 🟡 Fail rate = 0.050 Switch Gender 21/418 tested samples (5.02%) changed prediction after perturbation
🔍✨Examples When feature “text” is perturbed with the transformation “Switch Gender”, the model changes its prediction in 5.02% of the cases. We expected the predictions not to be affected by this transformation.
text Switch Gender(text) Original prediction Prediction after perturbation
40 Look #Steelers fans I know you may be upset about Suisham missing that kick. Just know that I heard a guy named Billy Cundiff is available. Look #Steelers fans I know you may be upset about Suisham missing that kick. Just know that I heard a gal named Billy Cundiff is available. neutral (p = 0.50) negative (p = 0.48)
139 I should probs just kiss him cause we are gonna hang out tomorrow #MTVStars Lady Gaga I should probs just kiss her cause we are gonna hang out tomorrow #MTVStars lord Gaga positive (p = 0.54) neutral (p = 0.49)
343 Big Brother starting next Friday? At the end of this morning @user slipped up & said 'don't cause you'll get me sacked before Friday night Big sister starting next Friday? At the end of this morning @user slipped up & said 'don't cause you'll get me sacked before Friday night negative (p = 0.55) neutral (p = 0.56)
👉Robustness issues (5)
Vulnerability Level Data slice Metric Transformation Deviation
Robustness major 🔴 Fail rate = 0.150 Add typos 150/1000 tested samples (15.0%) changed prediction after perturbation
🔍✨Examples When feature “text” is perturbed with the transformation “Add typos”, the model changes its prediction in 15.0% of the cases. We expected the predictions not to be affected by this transformation.
text Add typos(text) Original prediction Prediction after perturbation
1519 A historic milestone... Women in Saudi Arabia vote for the first time in municipal election! #DLC_Law30 A histotic kmilestone... Women in Saudi Arabia vote for he firs time in jmhjicipal ekection #DLC_Law30 positive (p = 0.92) neutral (p = 0.76)
1052 One of Android fans' biggest fears about the Galaxy Note 5 may be unfounded * 52 One of Android fans' biggest fears about the Gxlaxy Nlte 5 may be unfounde * 52 neutral (p = 0.54) negative (p = 0.66)
681 @user and if you feel Lawful and that you are full enough. May Allah guide you aright and so He knows Islam has no beginning and no End. @user and if you fdel Lawful an that hou are full enough. May Allah guide you aritht and so He knows Islam has no beginning and no End. positive (p = 0.55) neutral (p = 0.54)
Vulnerability Level Data slice Metric Transformation Deviation
Robustness major 🔴 Fail rate = 0.147 Transform to uppercase 147/1000 tested samples (14.7%) changed prediction after perturbation
🔍✨Examples When feature “text” is perturbed with the transformation “Transform to uppercase”, the model changes its prediction in 14.7% of the cases. We expected the predictions not to be affected by this transformation.
text Transform to uppercase(text) Original prediction Prediction after perturbation
1666 "If it ain't broke don't fix it, why move kris Bryant up to 3rd when he's hitting as good as he has all season at 5" "IF IT AIN'T BROKE DON'T FIX IT, WHY MOVE KRIS BRYANT UP TO 3RD WHEN HE'S HITTING AS GOOD AS HE HAS ALL SEASON AT 5" neutral (p = 0.65) negative (p = 0.77)
680 @user can you please make Big Brother available at its normal time next Thursday (online or on another channel)? Thank you. @USER CAN YOU PLEASE MAKE BIG BROTHER AVAILABLE AT ITS NORMAL TIME NEXT THURSDAY (ONLINE OR ON ANOTHER CHANNEL)? THANK YOU. neutral (p = 0.55) positive (p = 0.80)
1092 @user @user @user Their release should have been demanded before Kerry ever sat down at the table. @USER @USER @USER THEIR RELEASE SHOULD HAVE BEEN DEMANDED BEFORE KERRY EVER SAT DOWN AT THE TABLE. negative (p = 0.61) neutral (p = 0.56)
Vulnerability Level Data slice Metric Transformation Deviation
Robustness medium 🟡 Fail rate = 0.092 Transform to title case 92/1000 tested samples (9.2%) changed prediction after perturbation
🔍✨Examples When feature “text” is perturbed with the transformation “Transform to title case”, the model changes its prediction in 9.2% of the cases. We expected the predictions not to be affected by this transformation.
text Transform to title case(text) Original prediction Prediction after perturbation
1242 the most important thing madonna has ever said is " don't go for 2nd best " The Most Important Thing Madonna Has Ever Said Is " Don'T Go For 2Nd Best " neutral (p = 0.49) positive (p = 0.53)
1636 @user They're actually going venue shopping tomorrow! They're checking out Grand Bend and surrounding areas (ie. St. Mary's)! @User They'Re Actually Going Venue Shopping Tomorrow! They'Re Checking Out Grand Bend And Surrounding Areas (Ie. St. Mary'S)! positive (p = 0.63) neutral (p = 0.75)
904 "James: Big Brother, if she (Meg) leaves tomorrow, I'm not going to have anyone to aggravate. #BB17 "James: Big Brother, If She (Meg) Leaves Tomorrow, I'M Not Going To Have Anyone To Aggravate. #Bb17 negative (p = 0.51) neutral (p = 0.56)
Vulnerability Level Data slice Metric Transformation Deviation
Robustness medium 🟡 Fail rate = 0.082 Punctuation Removal 82/1000 tested samples (8.2%) changed prediction after perturbation
🔍✨Examples When feature “text” is perturbed with the transformation “Punctuation Removal”, the model changes its prediction in 8.2% of the cases. We expected the predictions not to be affected by this transformation.
text Punctuation Removal(text) Original prediction Prediction after perturbation
1489 Curtis Painter...we have a chance again! Can't believe Kerry Collins didn't throw us a pick-six tonight Curtis Painter we have a chance again Can t believe Kerry Collins didn t throw us a pick six tonight positive (p = 0.69) neutral (p = 0.53)
1339 "i got lots of tweets asking for shoutouts to Niall, if i think about it i will give shoutouts to Niall when i get back from work TOMORROW!!" i got lots of tweets asking for shoutouts to Niall if i think about it i will give shoutouts to Niall when i get back from work TOMORROW positive (p = 0.69) neutral (p = 0.54)
1952 @user @user Yellow journalism. But you know? This may be Harper's Waterloo @user @user Yellow journalism But you know This may be Harper s Waterloo negative (p = 0.56) neutral (p = 0.67)
Vulnerability Level Data slice Metric Transformation Deviation
Robustness medium 🟡 Fail rate = 0.052 Transform to lowercase 52/1000 tested samples (5.2%) changed prediction after perturbation
🔍✨Examples When feature “text” is perturbed with the transformation “Transform to lowercase”, the model changes its prediction in 5.2% of the cases. We expected the predictions not to be affected by this transformation.
text Transform to lowercase(text) Original prediction Prediction after perturbation
77 @user seriously! itunes puts like an entire minute as a preview so 20 seconds is nothing. AND I KNOW! it needs to be monday ASAP! @user seriously! itunes puts like an entire minute as a preview so 20 seconds is nothing. and i know! it needs to be monday asap! negative (p = 0.46) neutral (p = 0.48)
756 NIKE EMPLOYEE'S: If anyone want to work tomorrow at 5am call!!!!!!!!!!!!!!!!!! nike employee's: if anyone want to work tomorrow at 5am call!!!!!!!!!!!!!!!!!! positive (p = 0.56) neutral (p = 0.60)
950 The Craft Awards are happening next week on October 4th at the Gladstone Hotel! Invite all your friends and get... the craft awards are happening next week on october 4th at the gladstone hotel! invite all your friends and get... neutral (p = 0.51) positive (p = 0.64)
👉Performance issues (1)
Vulnerability Level Data slice Metric Transformation Deviation
Performance medium 🟡 text contains "like" Precision = 0.726 -5.94% than global
🔍✨Examples For records in the dataset where `text` contains "like", the Precision is 5.94% lower than the global Precision.
text label Predicted label
17 Why do y'all want Nicki to be pregnant so bad like maybe around the 7th album but she's literally still in her prime. neutral negative (p = 0.60)
30 Nicki did that for white media Idgaf . Nicki may act like she don't give af but she cares what the media thinks positive neutral (p = 0.50)
77 @user seriously! itunes puts like an entire minute as a preview so 20 seconds is nothing. AND I KNOW! it needs to be monday ASAP! neutral negative (p = 0.46)

Disclaimer: it's important to note that automated scans may produce false positives or miss certain vulnerabilities. We encourage you to review the findings and assess the impact accordingly.

💡 What's Next?

  • Checkout the Giskard Space and improve your model.
  • The Giskard community is always buzzing with ideas. 🐢🤔 What do you want to see next? Your feedback is our favorite fuel, so drop your thoughts in the community forum! 🗣️💬 Together, we're building something extraordinary.

🙌 Big Thanks!

We're grateful to have you on this adventure with us. 🚀🌟 Here's to more breakthroughs, laughter, and code magic! 🥂✨ Keep hugging that code and spreading the love! 💻 #Giskard #Huggingface #AISafety 🌈👏 Your enthusiasm, feedback, and contributions are what seek. 🌟 Keep being awesome!

Sign up or log in to comment