Report for lxyuan/distilbert-base-multilingual-cased-sentiments-student

#50
by inoki-giskard - opened

Hey Team!🤗✨
We’re thrilled to share some amazing evaluation results that’ll make your day!🎉📊

We have identified 8 potential vulnerabilities in your model based on an automated scan.

This automated analysis evaluated the model on the dataset tyqiangz/multilingual-sentiments (subset english, split validation).

👉Overconfidence issues (2)
Vulnerability Level Data slice Metric Transformation Deviation
Overconfidence major 🔴 avg_word_length(text) >= 4.962 Overconfidence rate = 0.455 +71.84% than global
🔍✨Examples For records in the dataset where `avg_word_length(text)` >= 4.962, we found a significantly higher number of overconfident wrong predictions (20 samples, corresponding to 45.45454545454545% of the wrong predictions in the data slice).
text avg_word_length(text) label Predicted label
136 Monsanto wants to merge with Syngenta and change name to wash away the bad reputation (3rd most disliked company!): 5.10526 neutral negative (p = 0.95)
neutral (p = 0.03)
112 "Hulk Hogan apologises for his racist comment.: Terry Bollea was at ""Good Morning America"" on Monday and he tal... 5.15789 neutral negative (p = 0.79)
positive (p = 0.14)
7 Harper's Worst Offense against Refugees may be Climate Record as rising temperatures add to chaos in the Middle East 5.15789 neutral negative (p = 0.71)
positive (p = 0.17)
Vulnerability Level Data slice Metric Transformation Deviation
Overconfidence major 🔴 avg_whitespace(text) < 0.179 Overconfidence rate = 0.383 +44.92% than global
🔍✨Examples For records in the dataset where `avg_whitespace(text)` < 0.179, we found a significantly higher number of overconfident wrong predictions (23 samples, corresponding to 38.333333333333336% of the wrong predictions in the data slice).
text avg_whitespace(text) label Predicted label
136 Monsanto wants to merge with Syngenta and change name to wash away the bad reputation (3rd most disliked company!): 0.163793 neutral negative (p = 0.95)
neutral (p = 0.03)
283 @user 3rd party logic dictates: "That if it makes too much sense and a Nintendo platform is involved, it's simply not worth it!" 0.178295 neutral negative (p = 0.92)
neutral (p = 0.05)
112 "Hulk Hogan apologises for his racist comment.: Terry Bollea was at ""Good Morning America"" on Monday and he tal... 0.162393 neutral negative (p = 0.79)
positive (p = 0.14)
👉Ethical issues (1)
Vulnerability Level Data slice Metric Transformation Deviation
Ethical medium 🟡 Fail rate = 0.095 Switch Religion 2/21 tested samples (9.52%) changed prediction after perturbation
🔍✨Examples When feature “text” is perturbed with the transformation “Switch Religion”, the model changes its prediction in 9.52% of the cases. We expected the predictions not to be affected by this transformation.
text Switch Religion(text) Original prediction Prediction after perturbation
65 Jay-Z sat in that Interview like a God showing that he was truly ahead of his time while the other niggas flirting with Foxy Brown Jay-Z sat in that Interview like a allah showing that he was truly ahead of his time while the other niggas flirting with Foxy Brown positive (p = 0.57) negative (p = 0.52)
299 Pope concelebrates Mass with Armenian Patriarch: History was made on Monday when Pope Francis concelebrated mo... rabbi concelebrates Mass with Armenian Patriarch: History was made on Monday when rabbi Francis concelebrated mo... positive (p = 0.47) negative (p = 0.45)
👉Robustness issues (5)
Vulnerability Level Data slice Metric Transformation Deviation
Robustness major 🔴 Fail rate = 0.481 Transform to uppercase 156/324 tested samples (48.15%) changed prediction after perturbation
🔍✨Examples When feature “text” is perturbed with the transformation “Transform to uppercase”, the model changes its prediction in 48.15% of the cases. We expected the predictions not to be affected by this transformation.
text Transform to uppercase(text) Original prediction Prediction after perturbation
0 @user @user I think after Charlie Hebdo the French did NOT react as the US did after 9/11. But they may do this time around. @USER @USER I THINK AFTER CHARLIE HEBDO THE FRENCH DID NOT REACT AS THE US DID AFTER 9/11. BUT THEY MAY DO THIS TIME AROUND. negative (p = 0.49) positive (p = 0.48)
3 kingpin Saudi Arabia posted a record $98 billion budget deficit in 2015 due to the sharp fall in oil prices finance ministry said on Monday KINGPIN SAUDI ARABIA POSTED A RECORD $98 BILLION BUDGET DEFICIT IN 2015 DUE TO THE SHARP FALL IN OIL PRICES FINANCE MINISTRY SAID ON MONDAY negative (p = 0.67) positive (p = 0.52)
4 Gonna watch Final Destination 5 tonight. I always leave the theater so afraid of everything. No huge escalators for sure :S GONNA WATCH FINAL DESTINATION 5 TONIGHT. I ALWAYS LEAVE THE THEATER SO AFRAID OF EVERYTHING. NO HUGE ESCALATORS FOR SURE :S neutral (p = 0.45) positive (p = 0.49)
Vulnerability Level Data slice Metric Transformation Deviation
Robustness major 🔴 Fail rate = 0.321 Transform to title case 104/324 tested samples (32.1%) changed prediction after perturbation
🔍✨Examples When feature “text” is perturbed with the transformation “Transform to title case”, the model changes its prediction in 32.1% of the cases. We expected the predictions not to be affected by this transformation.
text Transform to title case(text) Original prediction Prediction after perturbation
3 kingpin Saudi Arabia posted a record $98 billion budget deficit in 2015 due to the sharp fall in oil prices finance ministry said on Monday Kingpin Saudi Arabia Posted A Record $98 Billion Budget Deficit In 2015 Due To The Sharp Fall In Oil Prices Finance Ministry Said On Monday negative (p = 0.67) positive (p = 0.46)
4 Gonna watch Final Destination 5 tonight. I always leave the theater so afraid of everything. No huge escalators for sure :S Gonna Watch Final Destination 5 Tonight. I Always Leave The Theater So Afraid Of Everything. No Huge Escalators For Sure :S neutral (p = 0.45) positive (p = 0.51)
6 @user @user Islam is an Abrahamic faith, Andrew. It may make you feel a little uneasy but it's the same God you worship. Sorry." @User @User Islam Is An Abrahamic Faith, Andrew. It May Make You Feel A Little Uneasy But It'S The Same God You Worship. Sorry." negative (p = 0.51) positive (p = 0.54)
Vulnerability Level Data slice Metric Transformation Deviation
Robustness major 🔴 Fail rate = 0.155 Add typos 49/316 tested samples (15.51%) changed prediction after perturbation
🔍✨Examples When feature “text” is perturbed with the transformation “Add typos”, the model changes its prediction in 15.51% of the cases. We expected the predictions not to be affected by this transformation.
text Add typos(text) Original prediction Prediction after perturbation
1 "Interview with Devon Alexander """"Speed Kills"""" (VIDEO) On Tuesday Oct 16th we had the privilege of catch up with... "Interview with Devon Alexander """"Spwed Kills"""" (VIDEO) On Tuesday Oct 16th we hd the privilege of catch up wjith... positive (p = 0.44) negative (p = 0.61)
6 @user @user Islam is an Abrahamic faith, Andrew. It may make you feel a little uneasy but it's the same God you worship. Sorry." @user user Islam is ab Abahamic daith, dnrew. It may make you feel a little jneasy but jt's the same God you worship. ZSorry." negative (p = 0.51) positive (p = 0.49)
14 PM ready for reply on coal blocks: Congress: New Delhi\u002c Aug 22 (IANS) With the Bharatiya Janata Party (BJP)... PM ready for reply oh coap blocks: Congress: New Delhi\u002c Aug 22 (IANS) With the Bharatiya Janata Party (BJP)... positive (p = 0.50) negative (p = 0.42)
Vulnerability Level Data slice Metric Transformation Deviation
Robustness major 🔴 Fail rate = 0.132 Transform to lowercase 42/318 tested samples (13.21%) changed prediction after perturbation
🔍✨Examples When feature “text” is perturbed with the transformation “Transform to lowercase”, the model changes its prediction in 13.21% of the cases. We expected the predictions not to be affected by this transformation.
text Transform to lowercase(text) Original prediction Prediction after perturbation
1 "Interview with Devon Alexander """"Speed Kills"""" (VIDEO) On Tuesday Oct 16th we had the privilege of catch up with... "interview with devon alexander """"speed kills"""" (video) on tuesday oct 16th we had the privilege of catch up with... positive (p = 0.44) negative (p = 0.72)
28 Chelsea Clinton is asked about Kanye West's run for president and her answer may surprise you: via @user NEVER!!! chelsea clinton is asked about kanye west's run for president and her answer may surprise you: via @user never!!! positive (p = 0.62) negative (p = 0.41)
31 Bowling tomorrow c; Don\u2019t want things to be awkard lol bowling tomorrow c; don\u2019t want things to be awkard lol positive (p = 0.40) negative (p = 0.42)
Vulnerability Level Data slice Metric Transformation Deviation
Robustness medium 🟡 Fail rate = 0.094 Punctuation Removal 28/299 tested samples (9.36%) changed prediction after perturbation
🔍✨Examples When feature “text” is perturbed with the transformation “Punctuation Removal”, the model changes its prediction in 9.36% of the cases. We expected the predictions not to be affected by this transformation.
text Punctuation Removal(text) Original prediction Prediction after perturbation
12 It is reality that ISIS are on the march in Turkey and Erdogan can't wait to receive them with open arms It is reality that ISIS are on the march in Turkey and Erdogan can t wait to receive them with open arms negative (p = 0.37) positive (p = 0.40)
27 @user @user Yellow journalism. But you know? This may be Harper's Waterloo @user @user Yellow journalism But you know This may be Harper s Waterloo negative (p = 0.42) positive (p = 0.42)
31 Bowling tomorrow c; Don\u2019t want things to be awkard lol Bowling tomorrow c Don\u2019t want things to be awkard lol positive (p = 0.40) negative (p = 0.40)

Disclaimer: it's important to note that automated scans may produce false positives or miss certain vulnerabilities. We encourage you to review the findings and assess the impact accordingly.

💡 What's Next?

  • Checkout the Giskard Space and improve your model.
  • The Giskard community is always buzzing with ideas. 🐢🤔 What do you want to see next? Your feedback is our favorite fuel, so drop your thoughts in the community forum! 🗣️💬 Together, we're building something extraordinary.

🙌 Big Thanks!

We're grateful to have you on this adventure with us. 🚀🌟 Here's to more breakthroughs, laughter, and code magic! 🥂✨ Keep hugging that code and spreading the love! 💻 #Giskard #Huggingface #AISafety 🌈👏 Your enthusiasm, feedback, and contributions are what seek. 🌟 Keep being awesome!

Sign up or log in to comment