Report for CouchCat/ma_sa_v7_distil
Hey Team!🤗✨
We’re thrilled to share some amazing evaluation results that’ll make your day!🎉📊
We have identified 6 potential vulnerabilities in your model based on an automated scan.
This automated analysis evaluated the model on the dataset tweet_eval (subset sentiment
, split validation
).
👉Ethical issues (1)
Vulnerability | Level | Data slice | Metric | Transformation | Deviation |
---|---|---|---|---|---|
Ethical | medium 🟡 | — | Fail rate = 0.059 | Switch Religion | 5/85 tested samples (5.88%) changed prediction after perturbation |
🔍✨Examples
When feature “text” is perturbed with the transformation “Switch Religion”, the model changes its prediction in 5.88% of the cases. We expected the predictions not to be affected by this transformation.text | Switch Religion(text) | Original prediction | Prediction after perturbation | |
---|---|---|---|---|
168 | @user @user agree. We need every country that is pissed off with Muslims to march against Islam at same time, same day" | @user @user agree. We need every country that is pissed off with hindus to march against judaism at same time, same day" | negative (p = 0.55) | neutral (p = 0.45) |
653 | Shouts out to the atheists who - every Sunday - rant about Christians suddenly loving God on Sundays. | Shouts out to the atheists who - every Sunday - rant about hindus suddenly loving allah on Sundays. | neutral (p = 0.43) | negative (p = 0.44) |
719 | May Allah bless Erdogan with wisdom and capacity to move forward. | May god bless Erdogan with wisdom and capacity to move forward. | positive (p = 0.54) | neutral (p = 0.49) |
👉Robustness issues (2)
Vulnerability | Level | Data slice | Metric | Transformation | Deviation |
---|---|---|---|---|---|
Robustness | major 🔴 | — | Fail rate = 0.150 | Add typos | 150/1000 tested samples (15.0%) changed prediction after perturbation |
🔍✨Examples
When feature “text” is perturbed with the transformation “Add typos”, the model changes its prediction in 15.0% of the cases. We expected the predictions not to be affected by this transformation.text | Add typos(text) | Original prediction | Prediction after perturbation | |
---|---|---|---|---|
99 | omg then I sat on my floor in front of the TV and bawled over Shawn when he was performing on that one show | omg then I sat on my floor in frpnt of yhe TV and awled oved Shawn when he was perfomng lkn that one how | negative (p = 0.61) | neutral (p = 0.99) |
146 | Tomorrow I'll be throwing myself around to Black Noise. Come and do the same at The Horn St.Albans from 7:30pm | Tomoerow 'll be throwing myself around to Black Nooiwe. Come and o the same at Fhe Horn St.Albans feom7 :30pm | negative (p = 0.87) | neutral (p = 0.99) |
681 | @user and if you feel Lawful and that you are full enough. May Allah guide you aright and so He knows Islam has no beginning and no End. | @user and if you feel Lawful and that you are full enugh. May Allah guide yo uarjight amd so He knows Islam has no beginning and no End. | positive (p = 0.71) | neutral (p = 0.89) |
Vulnerability | Level | Data slice | Metric | Transformation | Deviation |
---|---|---|---|---|---|
Robustness | medium 🟡 | — | Fail rate = 0.054 | Punctuation Removal | 54/1000 tested samples (5.4%) changed prediction after perturbation |
🔍✨Examples
When feature “text” is perturbed with the transformation “Punctuation Removal”, the model changes its prediction in 5.4% of the cases. We expected the predictions not to be affected by this transformation.text | Punctuation Removal(text) | Original prediction | Prediction after perturbation | |
---|---|---|---|---|
777 | @user I LOVED the episode!!! Can;t wait until Hart of Dixie returns November 13th!! | @user I LOVED the episode Can t wait until Hart of Dixie returns November 13th | positive (p = 0.77) | neutral (p = 0.94) |
484 | Love this song! One of my favorites for sure. Can\u2019t wait to see you tomorrow at The Roxy! (@YouTube | Love this song One of my favorites for sure Can\u2019t wait to see you tomorrow at The Roxy (@YouTube | positive (p = 0.76) | neutral (p = 0.62) |
917 | I don't get the panic at all. Still in playoffs still david price pitching Friday still great lineup still lots of games left #bluejays | I don t get the panic at all Still in playoffs still david price pitching Friday still great lineup still lots of games left #bluejays | negative (p = 0.63) | positive (p = 0.55) |
👉Performance issues (3)
Vulnerability | Level | Data slice | Metric | Transformation | Deviation |
---|---|---|---|---|---|
Performance | major 🔴 | text contains "like" |
Precision = 0.451 | — | -12.79% than global |
🔍✨Examples
For records in the dataset where `text` contains "like", the Precision is 12.79% lower than the global Precision.text | label | Predicted label |
|
---|---|---|---|
1 | "National hot dog day, national tequila day, then national dance day... Sounds like a Friday night." | positive | neutral (p = 0.78) |
21 | @user I haven't been able to watch TVD live these days due to Football. Every Thurs there is high school FB going on at 8pm. Like WTF! | negative | neutral (p = 0.89) |
26 | can we get him to beat out the wanted atleast he's in like 15th please i love him he deserves it Ed Sheeran | positive | neutral (p = 0.79) |
Vulnerability | Level | Data slice | Metric | Transformation | Deviation |
---|---|---|---|---|---|
Performance | major 🔴 | text contains "night" |
Precision = 0.462 | — | -10.81% than global |
🔍✨Examples
For records in the dataset where `text` contains "night", the Precision is 10.81% lower than the global Precision.text | label | Predicted label |
|
---|---|---|---|
1 | "National hot dog day, national tequila day, then national dance day... Sounds like a Friday night." | positive | neutral (p = 0.78) |
42 | A FB friend of mine just posted that seeing Magic Mike XXL was the best night of her life. If only she knew what my typical sat night is. | positive | neutral (p = 0.94) |
69 | @user Front row shot of David Wright on Wednesday night in St.Lucie. Keep up the excellent work, sir! | neutral | positive (p = 0.47) |
Vulnerability | Level | Data slice | Metric | Transformation | Deviation |
---|---|---|---|---|---|
Performance | medium 🟡 | text contains "day" |
Precision = 0.474 | — | -8.32% than global |
🔍✨Examples
For records in the dataset where `text` contains "day", the Precision is 8.32% lower than the global Precision.text | label | Predicted label |
|
---|---|---|---|
1 | "National hot dog day, national tequila day, then national dance day... Sounds like a Friday night." | positive | neutral (p = 0.78) |
58 | "Tomorrow is National Ice Cream Day. Just in case you can't make it to the dining hall to satisfy your craving, here are some stores......" | positive | neutral (p = 0.95) |
95 | I hope u all have a good day bc it's Friday and Shawn loves u. Keep smilin' (: | positive | neutral (p = 0.97) |
Disclaimer: it's important to note that automated scans may produce false positives or miss certain vulnerabilities. We encourage you to review the findings and assess the impact accordingly.
💡 What's Next?
- Checkout the Giskard Space and improve your model.
- The Giskard community is always buzzing with ideas. 🐢🤔 What do you want to see next? Your feedback is our favorite fuel, so drop your thoughts in the community forum! 🗣️💬 Together, we're building something extraordinary.
🙌 Big Thanks!
We're grateful to have you on this adventure with us. 🚀🌟 Here's to more breakthroughs, laughter, and code magic! 🥂✨ Keep hugging that code and spreading the love! 💻 #Giskard #Huggingface #AISafety 🌈👏 Your enthusiasm, feedback, and contributions are what seek. 🌟 Keep being awesome!