Report for cardiffnlp/twitter-roberta-base-sentiment-latest

#52
by inoki-giskard - opened

Hey Team!🤗✨
We’re thrilled to share some amazing evaluation results that’ll make your day!🎉📊

We have identified 7 potential vulnerabilities in your model based on an automated scan.

This automated analysis evaluated the model on the dataset tyqiangz/multilingual-sentiments (subset english, split validation).

👉Overconfidence issues (2)
Vulnerability Level Data slice Metric Transformation Deviation
Overconfidence major 🔴 avg_word_length(text) >= 4.512 Overconfidence rate = 0.537 +20.73% than global
🔍✨Examples For records in the dataset where `avg_word_length(text)` >= 4.512, we found a significantly higher number of overconfident wrong predictions (22 samples, corresponding to 53.65853658536586% of the wrong predictions in the data slice).
text avg_word_length(text) label Predicted label
123 @user @user michael ball is incredible 10th anniversary with him and colm is sick 4.85714 negative positive (p = 0.97)
neutral (p = 0.02)
36 David Cameron's statement on camera on Thursday 03 September 2015: he will take in 'more' of the refugees: was he speaking TO TV Cameras? 4.75 negative neutral (p = 0.95)
positive (p = 0.04)
14 PM ready for reply on coal blocks: Congress: New Delhi\u002c Aug 22 (IANS) With the Bharatiya Janata Party (BJP)... 5.10526 positive neutral (p = 0.95)
positive (p = 0.03)
Vulnerability Level Data slice Metric Transformation Deviation
Overconfidence medium 🟡 avg_whitespace(text) < 0.181 Overconfidence rate = 0.525 +18.13% than global
🔍✨Examples For records in the dataset where `avg_whitespace(text)` < 0.181, we found a significantly higher number of overconfident wrong predictions (21 samples, corresponding to 52.5% of the wrong predictions in the data slice).
text avg_whitespace(text) label Predicted label
123 @user @user michael ball is incredible 10th anniversary with him and colm is sick 0.170732 negative positive (p = 0.97)
neutral (p = 0.02)
36 David Cameron's statement on camera on Thursday 03 September 2015: he will take in 'more' of the refugees: was he speaking TO TV Cameras? 0.179856 negative neutral (p = 0.95)
positive (p = 0.04)
14 PM ready for reply on coal blocks: Congress: New Delhi\u002c Aug 22 (IANS) With the Bharatiya Janata Party (BJP)... 0.163793 positive neutral (p = 0.95)
positive (p = 0.03)
👉Robustness issues (5)
Vulnerability Level Data slice Metric Transformation Deviation
Robustness major 🔴 Fail rate = 0.182 Transform to uppercase 59/324 tested samples (18.21%) changed prediction after perturbation
🔍✨Examples When feature “text” is perturbed with the transformation “Transform to uppercase”, the model changes its prediction in 18.21% of the cases. We expected the predictions not to be affected by this transformation.
text Transform to uppercase(text) Original prediction Prediction after perturbation
0 @user @user I think after Charlie Hebdo the French did NOT react as the US did after 9/11. But they may do this time around. @USER @USER I THINK AFTER CHARLIE HEBDO THE FRENCH DID NOT REACT AS THE US DID AFTER 9/11. BUT THEY MAY DO THIS TIME AROUND. negative (p = 0.50) neutral (p = 0.67)
8 @user call Hafiz saeed sir he may help u out. Maybe Pope can b handy . Try it. @USER CALL HAFIZ SAEED SIR HE MAY HELP U OUT. MAYBE POPE CAN B HANDY . TRY IT. neutral (p = 0.67) positive (p = 0.61)
10 "LONDON (AP) "" Prince George celebrates his second birthday on Wednesday and while he's just a toddler, he's al... "LONDON (AP) "" PRINCE GEORGE CELEBRATES HIS SECOND BIRTHDAY ON WEDNESDAY AND WHILE HE'S JUST A TODDLER, HE'S AL... positive (p = 0.65) neutral (p = 0.56)
Vulnerability Level Data slice Metric Transformation Deviation
Robustness major 🔴 Fail rate = 0.110 Add typos 34/308 tested samples (11.04%) changed prediction after perturbation
🔍✨Examples When feature “text” is perturbed with the transformation “Add typos”, the model changes its prediction in 11.04% of the cases. We expected the predictions not to be affected by this transformation.
text Add typos(text) Original prediction Prediction after perturbation
22 Hey David Bowie Do u want to get iPh0ne 6 for FREE? U better check my bi0. Thx Hey David Bowie Do u want to get iPh0ne 6 for FREE? U better chrck my bi0. Thx neutral (p = 0.56) positive (p = 0.51)
27 @user @user Yellow journalism. But you know? This may be Harper's Waterloo @user @user Yellow journlism. But hyou know? This may be Harper's Wwterloo negative (p = 0.59) neutral (p = 0.54)
48 I'm gonna watch Sharknado 3 cause I have no tv shows to watch on a Wednesday not cause I enjoy it. I'm gonna watch Sharknado 3 cause I have no tv shows to watch on a Wednesday nkot cause I enjoy it. neutral (p = 0.41) positive (p = 0.90)
Vulnerability Level Data slice Metric Transformation Deviation
Robustness major 🔴 Fail rate = 0.100 Punctuation Removal 30/299 tested samples (10.03%) changed prediction after perturbation
🔍✨Examples When feature “text” is perturbed with the transformation “Punctuation Removal”, the model changes its prediction in 10.03% of the cases. We expected the predictions not to be affected by this transformation.
text Punctuation Removal(text) Original prediction Prediction after perturbation
0 @user @user I think after Charlie Hebdo the French did NOT react as the US did after 9/11. But they may do this time around. @user @user I think after Charlie Hebdo the French did NOT react as the US did after 9 11 But they may do this time around negative (p = 0.50) neutral (p = 0.50)
2 Hold on... Sam Smith may do the theme to Spectre!? Dope!!!!!! #007 #SPECTRE #JamesBond Hold on Sam Smith may do the theme to Spectre Dope #007 #SPECTRE #JamesBond positive (p = 0.83) neutral (p = 0.90)
6 @user @user Islam is an Abrahamic faith, Andrew. It may make you feel a little uneasy but it's the same God you worship. Sorry." @user @user Islam is an Abrahamic faith Andrew It may make you feel a little uneasy but it s the same God you worship Sorry neutral (p = 0.59) negative (p = 0.50)
Vulnerability Level Data slice Metric Transformation Deviation
Robustness medium 🟡 Fail rate = 0.096 Transform to title case 31/324 tested samples (9.57%) changed prediction after perturbation
🔍✨Examples When feature “text” is perturbed with the transformation “Transform to title case”, the model changes its prediction in 9.57% of the cases. We expected the predictions not to be affected by this transformation.
text Transform to title case(text) Original prediction Prediction after perturbation
0 @user @user I think after Charlie Hebdo the French did NOT react as the US did after 9/11. But they may do this time around. @User @User I Think After Charlie Hebdo The French Did Not React As The Us Did After 9/11. But They May Do This Time Around. negative (p = 0.50) neutral (p = 0.56)
9 Disappointed the Knicks vs Nets game got canceled tonight\u002c but I\u2019m even more hyped for Knicks vs Heat on Friday! Disappointed The Knicks Vs Nets Game Got Canceled Tonight\U002C But I\U2019M Even More Hyped For Knicks Vs Heat On Friday! positive (p = 0.56) neutral (p = 0.39)
51 @user tom Brady did not deflate balls, but was suspended for 4 games bc he may or may not have known it was being done" @User Tom Brady Did Not Deflate Balls, But Was Suspended For 4 Games Bc He May Or May Not Have Known It Was Being Done" negative (p = 0.51) neutral (p = 0.69)
Vulnerability Level Data slice Metric Transformation Deviation
Robustness medium 🟡 Fail rate = 0.063 Transform to lowercase 20/318 tested samples (6.29%) changed prediction after perturbation
🔍✨Examples When feature “text” is perturbed with the transformation “Transform to lowercase”, the model changes its prediction in 6.29% of the cases. We expected the predictions not to be affected by this transformation.
text Transform to lowercase(text) Original prediction Prediction after perturbation
0 @user @user I think after Charlie Hebdo the French did NOT react as the US did after 9/11. But they may do this time around. @user @user i think after charlie hebdo the french did not react as the us did after 9/11. but they may do this time around. negative (p = 0.50) neutral (p = 0.71)
3 kingpin Saudi Arabia posted a record $98 billion budget deficit in 2015 due to the sharp fall in oil prices finance ministry said on Monday kingpin saudi arabia posted a record $98 billion budget deficit in 2015 due to the sharp fall in oil prices finance ministry said on monday neutral (p = 0.50) negative (p = 0.54)
12 It is reality that ISIS are on the march in Turkey and Erdogan can't wait to receive them with open arms it is reality that isis are on the march in turkey and erdogan can't wait to receive them with open arms negative (p = 0.61) positive (p = 0.77)

Disclaimer: it's important to note that automated scans may produce false positives or miss certain vulnerabilities. We encourage you to review the findings and assess the impact accordingly.

💡 What's Next?

  • Checkout the Giskard Space and improve your model.
  • The Giskard community is always buzzing with ideas. 🐢🤔 What do you want to see next? Your feedback is our favorite fuel, so drop your thoughts in the community forum! 🗣️💬 Together, we're building something extraordinary.

🙌 Big Thanks!

We're grateful to have you on this adventure with us. 🚀🌟 Here's to more breakthroughs, laughter, and code magic! 🥂✨ Keep hugging that code and spreading the love! 💻 #Giskard #Huggingface #AISafety 🌈👏 Your enthusiasm, feedback, and contributions are what seek. 🌟 Keep being awesome!

Sign up or log in to comment