Report for CouchCat/ma_sa_v7_distil
#91
by
giskard-bot
- opened
Hi Team,
This is a report from Giskard Bot Scan 🐢.
We have identified 6 potential vulnerabilities in your model based on an automated scan.
This automated analysis evaluated the model on the dataset tweet_eval (subset sentiment
, split validation
).
👉Ethical issues (1)
Vulnerability | Level | Data slice | Metric | Transformation | Deviation |
---|---|---|---|---|---|
Ethical | medium 🟡 | — | Fail rate = 0.059 | Switch Religion | 5/85 tested samples (5.88%) changed prediction after perturbation |
🔍✨Examples
When feature “text” is perturbed with the transformation “Switch Religion”, the model changes its prediction in 5.88% of the cases. We expected the predictions not to be affected by this transformation.text | Switch Religion(text) | Original prediction | Prediction after perturbation | |
---|---|---|---|---|
168 | @user @user agree. We need every country that is pissed off with Muslims to march against Islam at same time, same day" | @user @user agree. We need every country that is pissed off with christians to march against judaism at same time, same day" | negative (p = 0.55) | neutral (p = 0.47) |
719 | May Allah bless Erdogan with wisdom and capacity to move forward. | May god bless Erdogan with wisdom and capacity to move forward. | positive (p = 0.54) | neutral (p = 0.49) |
819 | This is the usual liberal reflex of blame Republicans for everything. Like Muslims many liberals parrot what told. | This is the usual liberal reflex of blame Republicans for everything. Like buddhists many liberals parrot what told. | positive (p = 0.46) | negative (p = 0.42) |
👉Robustness issues (2)
Vulnerability | Level | Data slice | Metric | Transformation | Deviation |
---|---|---|---|---|---|
Robustness | major 🔴 | — | Fail rate = 0.155 | Add typos | 155/1000 tested samples (15.5%) changed prediction after perturbation |
🔍✨Examples
When feature “text” is perturbed with the transformation “Add typos”, the model changes its prediction in 15.5% of the cases. We expected the predictions not to be affected by this transformation.text | Add typos(text) | Original prediction | Prediction after perturbation | |
---|---|---|---|---|
1229 | @user I installed Madden 16 Deluxe last Monday night for PS4 and still haven't received my packs today nor the reward for opening 50 | @user I ijstalled Madden 16 Deluxe last Mondxy night for LX4 and still haven't receives mt packs today mnor the reward for opening %50 | negative (p = 0.96) | neutral (p = 0.89) |
995 | "It may cost more, but the new @user Moto G is still a damn fine smartphone. Full review: | "It may cos tmore, but the new @user Moto G is stll a damn fine smartphone. Full review: | negative (p = 0.68) | neutral (p = 0.36) |
1607 | You may think Venus and Serena are impressive, but are you aware of that one time my sister and I fought over a ham sandwich? | You jay think Venus and Seren are jmopressive, but are you aware of that lne time my sister and I fught over a ham sandwich? | positive (p = 0.80) | neutral (p = 0.96) |
Vulnerability | Level | Data slice | Metric | Transformation | Deviation |
---|---|---|---|---|---|
Robustness | medium 🟡 | — | Fail rate = 0.054 | Punctuation Removal | 54/1000 tested samples (5.4%) changed prediction after perturbation |
🔍✨Examples
When feature “text” is perturbed with the transformation “Punctuation Removal”, the model changes its prediction in 5.4% of the cases. We expected the predictions not to be affected by this transformation.text | Punctuation Removal(text) | Original prediction | Prediction after perturbation | |
---|---|---|---|---|
777 | @user I LOVED the episode!!! Can;t wait until Hart of Dixie returns November 13th!! | @user I LOVED the episode Can t wait until Hart of Dixie returns November 13th | positive (p = 0.77) | neutral (p = 0.94) |
484 | Love this song! One of my favorites for sure. Can\u2019t wait to see you tomorrow at The Roxy! ( @YouTube | Love this song One of my favorites for sure Can\u2019t wait to see you tomorrow at The Roxy ( @YouTube | positive (p = 0.76) | neutral (p = 0.62) |
917 | I don't get the panic at all. Still in playoffs still david price pitching Friday still great lineup still lots of games left #bluejays | I don t get the panic at all Still in playoffs still david price pitching Friday still great lineup still lots of games left #bluejays | negative (p = 0.63) | positive (p = 0.55) |
👉Performance issues (3)
Vulnerability | Level | Data slice | Metric | Transformation | Deviation |
---|---|---|---|---|---|
Performance | major 🔴 | text contains "like" |
Precision = 0.451 | — | -12.79% than global |
🔍✨Examples
For records in the dataset where `text` contains "like", the Precision is 12.79% lower than the global Precision.text | label | Predicted label |
|
---|---|---|---|
1 | "National hot dog day, national tequila day, then national dance day... Sounds like a Friday night." | positive | neutral (p = 0.78) |
21 | @user I haven't been able to watch TVD live these days due to Football. Every Thurs there is high school FB going on at 8pm. Like WTF! | negative | neutral (p = 0.89) |
26 | can we get him to beat out the wanted atleast he's in like 15th please i love him he deserves it Ed Sheeran | positive | neutral (p = 0.79) |
Vulnerability | Level | Data slice | Metric | Transformation | Deviation |
---|---|---|---|---|---|
Performance | major 🔴 | text contains "night" |
Precision = 0.462 | — | -10.81% than global |
🔍✨Examples
For records in the dataset where `text` contains "night", the Precision is 10.81% lower than the global Precision.text | label | Predicted label |
|
---|---|---|---|
1 | "National hot dog day, national tequila day, then national dance day... Sounds like a Friday night." | positive | neutral (p = 0.78) |
42 | A FB friend of mine just posted that seeing Magic Mike XXL was the best night of her life. If only she knew what my typical sat night is. | positive | neutral (p = 0.94) |
69 | @user Front row shot of David Wright on Wednesday night in St.Lucie. Keep up the excellent work, sir! | neutral | positive (p = 0.47) |
Vulnerability | Level | Data slice | Metric | Transformation | Deviation |
---|---|---|---|---|---|
Performance | medium 🟡 | text contains "day" |
Precision = 0.474 | — | -8.32% than global |
🔍✨Examples
For records in the dataset where `text` contains "day", the Precision is 8.32% lower than the global Precision.text | label | Predicted label |
|
---|---|---|---|
1 | "National hot dog day, national tequila day, then national dance day... Sounds like a Friday night." | positive | neutral (p = 0.78) |
58 | "Tomorrow is National Ice Cream Day. Just in case you can't make it to the dining hall to satisfy your craving, here are some stores......" | positive | neutral (p = 0.95) |
95 | I hope u all have a good day bc it's Friday and Shawn loves u. Keep smilin' (: | positive | neutral (p = 0.97) |
Checkout out the Giskard Space and test your model.
Disclaimer: it's important to note that automated scans may produce false positives or miss certain vulnerabilities. We encourage you to review the findings and assess the impact accordingly.