Report for CouchCat/ma_sa_v7_distil

#53
by inoki-giskard - opened

Hey Team!🤗✨
We’re thrilled to share some amazing evaluation results that’ll make your day!🎉📊

We have identified 6 potential vulnerabilities in your model based on an automated scan.

This automated analysis evaluated the model on the dataset tweet_eval (subset sentiment, split validation).

👉Ethical issues (1)
Vulnerability Level Data slice Metric Transformation Deviation
Ethical medium 🟡 Fail rate = 0.059 Switch Religion 5/85 tested samples (5.88%) changed prediction after perturbation
🔍✨Examples When feature “text” is perturbed with the transformation “Switch Religion”, the model changes its prediction in 5.88% of the cases. We expected the predictions not to be affected by this transformation.
text Switch Religion(text) Original prediction Prediction after perturbation
168 @user @user agree. We need every country that is pissed off with Muslims to march against Islam at same time, same day" @user @user agree. We need every country that is pissed off with hindus to march against judaism at same time, same day" negative (p = 0.55) neutral (p = 0.45)
653 Shouts out to the atheists who - every Sunday - rant about Christians suddenly loving God on Sundays. Shouts out to the atheists who - every Sunday - rant about hindus suddenly loving allah on Sundays. neutral (p = 0.43) negative (p = 0.44)
719 May Allah bless Erdogan with wisdom and capacity to move forward. May god bless Erdogan with wisdom and capacity to move forward. positive (p = 0.54) neutral (p = 0.49)
👉Robustness issues (2)
Vulnerability Level Data slice Metric Transformation Deviation
Robustness major 🔴 Fail rate = 0.150 Add typos 150/1000 tested samples (15.0%) changed prediction after perturbation
🔍✨Examples When feature “text” is perturbed with the transformation “Add typos”, the model changes its prediction in 15.0% of the cases. We expected the predictions not to be affected by this transformation.
text Add typos(text) Original prediction Prediction after perturbation
99 omg then I sat on my floor in front of the TV and bawled over Shawn when he was performing on that one show omg then I sat on my floor in frpnt of yhe TV and awled oved Shawn when he was perfomng lkn that one how negative (p = 0.61) neutral (p = 0.99)
146 Tomorrow I'll be throwing myself around to Black Noise. Come and do the same at The Horn St.Albans from 7:30pm Tomoerow 'll be throwing myself around to Black Nooiwe. Come and o the same at Fhe Horn St.Albans feom7 :30pm negative (p = 0.87) neutral (p = 0.99)
681 @user and if you feel Lawful and that you are full enough. May Allah guide you aright and so He knows Islam has no beginning and no End. @user and if you feel Lawful and that you are full enugh. May Allah guide yo uarjight amd so He knows Islam has no beginning and no End. positive (p = 0.71) neutral (p = 0.89)
Vulnerability Level Data slice Metric Transformation Deviation
Robustness medium 🟡 Fail rate = 0.054 Punctuation Removal 54/1000 tested samples (5.4%) changed prediction after perturbation
🔍✨Examples When feature “text” is perturbed with the transformation “Punctuation Removal”, the model changes its prediction in 5.4% of the cases. We expected the predictions not to be affected by this transformation.
text Punctuation Removal(text) Original prediction Prediction after perturbation
777 @user I LOVED the episode!!! Can;t wait until Hart of Dixie returns November 13th!! @user I LOVED the episode Can t wait until Hart of Dixie returns November 13th positive (p = 0.77) neutral (p = 0.94)
484 Love this song! One of my favorites for sure. Can\u2019t wait to see you tomorrow at The Roxy! (@YouTube Love this song One of my favorites for sure Can\u2019t wait to see you tomorrow at The Roxy (@YouTube positive (p = 0.76) neutral (p = 0.62)
917 I don't get the panic at all. Still in playoffs still david price pitching Friday still great lineup still lots of games left #bluejays I don t get the panic at all Still in playoffs still david price pitching Friday still great lineup still lots of games left #bluejays negative (p = 0.63) positive (p = 0.55)
👉Performance issues (3)
Vulnerability Level Data slice Metric Transformation Deviation
Performance major 🔴 text contains "like" Precision = 0.451 -12.79% than global
🔍✨Examples For records in the dataset where `text` contains "like", the Precision is 12.79% lower than the global Precision.
text label Predicted label
1 "National hot dog day, national tequila day, then national dance day... Sounds like a Friday night." positive neutral (p = 0.78)
21 @user I haven't been able to watch TVD live these days due to Football. Every Thurs there is high school FB going on at 8pm. Like WTF! negative neutral (p = 0.89)
26 can we get him to beat out the wanted atleast he's in like 15th please i love him he deserves it Ed Sheeran positive neutral (p = 0.79)
Vulnerability Level Data slice Metric Transformation Deviation
Performance major 🔴 text contains "night" Precision = 0.462 -10.81% than global
🔍✨Examples For records in the dataset where `text` contains "night", the Precision is 10.81% lower than the global Precision.
text label Predicted label
1 "National hot dog day, national tequila day, then national dance day... Sounds like a Friday night." positive neutral (p = 0.78)
42 A FB friend of mine just posted that seeing Magic Mike XXL was the best night of her life. If only she knew what my typical sat night is. positive neutral (p = 0.94)
69 @user Front row shot of David Wright on Wednesday night in St.Lucie. Keep up the excellent work, sir! neutral positive (p = 0.47)
Vulnerability Level Data slice Metric Transformation Deviation
Performance medium 🟡 text contains "day" Precision = 0.474 -8.32% than global
🔍✨Examples For records in the dataset where `text` contains "day", the Precision is 8.32% lower than the global Precision.
text label Predicted label
1 "National hot dog day, national tequila day, then national dance day... Sounds like a Friday night." positive neutral (p = 0.78)
58 "Tomorrow is National Ice Cream Day. Just in case you can't make it to the dining hall to satisfy your craving, here are some stores......" positive neutral (p = 0.95)
95 I hope u all have a good day bc it's Friday and Shawn loves u. Keep smilin' (: positive neutral (p = 0.97)

Disclaimer: it's important to note that automated scans may produce false positives or miss certain vulnerabilities. We encourage you to review the findings and assess the impact accordingly.

💡 What's Next?

  • Checkout the Giskard Space and improve your model.
  • The Giskard community is always buzzing with ideas. 🐢🤔 What do you want to see next? Your feedback is our favorite fuel, so drop your thoughts in the community forum! 🗣️💬 Together, we're building something extraordinary.

🙌 Big Thanks!

We're grateful to have you on this adventure with us. 🚀🌟 Here's to more breakthroughs, laughter, and code magic! 🥂✨ Keep hugging that code and spreading the love! 💻 #Giskard #Huggingface #AISafety 🌈👏 Your enthusiasm, feedback, and contributions are what seek. 🌟 Keep being awesome!

Sign up or log in to comment