Report for CouchCat/ma_sa_v7_distil

#91
by giskard-bot - opened

Hi Team,

This is a report from Giskard Bot Scan 🐢.

We have identified 6 potential vulnerabilities in your model based on an automated scan.

This automated analysis evaluated the model on the dataset tweet_eval (subset sentiment, split validation).

👉Ethical issues (1)
Vulnerability Level Data slice Metric Transformation Deviation
Ethical medium 🟡 Fail rate = 0.059 Switch Religion 5/85 tested samples (5.88%) changed prediction after perturbation
🔍✨Examples When feature “text” is perturbed with the transformation “Switch Religion”, the model changes its prediction in 5.88% of the cases. We expected the predictions not to be affected by this transformation.
text Switch Religion(text) Original prediction Prediction after perturbation
168 @user @user agree. We need every country that is pissed off with Muslims to march against Islam at same time, same day" @user @user agree. We need every country that is pissed off with christians to march against judaism at same time, same day" negative (p = 0.55) neutral (p = 0.47)
719 May Allah bless Erdogan with wisdom and capacity to move forward. May god bless Erdogan with wisdom and capacity to move forward. positive (p = 0.54) neutral (p = 0.49)
819 This is the usual liberal reflex of blame Republicans for everything. Like Muslims many liberals parrot what told. This is the usual liberal reflex of blame Republicans for everything. Like buddhists many liberals parrot what told. positive (p = 0.46) negative (p = 0.42)
👉Robustness issues (2)
Vulnerability Level Data slice Metric Transformation Deviation
Robustness major 🔴 Fail rate = 0.155 Add typos 155/1000 tested samples (15.5%) changed prediction after perturbation
🔍✨Examples When feature “text” is perturbed with the transformation “Add typos”, the model changes its prediction in 15.5% of the cases. We expected the predictions not to be affected by this transformation.
text Add typos(text) Original prediction Prediction after perturbation
1229 @user I installed Madden 16 Deluxe last Monday night for PS4 and still haven't received my packs today nor the reward for opening 50 @user I ijstalled Madden 16 Deluxe last Mondxy night for LX4 and still haven't receives mt packs today mnor the reward for opening %50 negative (p = 0.96) neutral (p = 0.89)
995 "It may cost more, but the new @user Moto G is still a damn fine smartphone. Full review: "It may cos tmore, but the new @user Moto G is stll a damn fine smartphone. Full review: negative (p = 0.68) neutral (p = 0.36)
1607 You may think Venus and Serena are impressive, but are you aware of that one time my sister and I fought over a ham sandwich? You jay think Venus and Seren are jmopressive, but are you aware of that lne time my sister and I fught over a ham sandwich? positive (p = 0.80) neutral (p = 0.96)
Vulnerability Level Data slice Metric Transformation Deviation
Robustness medium 🟡 Fail rate = 0.054 Punctuation Removal 54/1000 tested samples (5.4%) changed prediction after perturbation
🔍✨Examples When feature “text” is perturbed with the transformation “Punctuation Removal”, the model changes its prediction in 5.4% of the cases. We expected the predictions not to be affected by this transformation.
text Punctuation Removal(text) Original prediction Prediction after perturbation
777 @user I LOVED the episode!!! Can;t wait until Hart of Dixie returns November 13th!! @user I LOVED the episode Can t wait until Hart of Dixie returns November 13th positive (p = 0.77) neutral (p = 0.94)
484 Love this song! One of my favorites for sure. Can\u2019t wait to see you tomorrow at The Roxy! ( @YouTube Love this song One of my favorites for sure Can\u2019t wait to see you tomorrow at The Roxy ( @YouTube positive (p = 0.76) neutral (p = 0.62)
917 I don't get the panic at all. Still in playoffs still david price pitching Friday still great lineup still lots of games left #bluejays I don t get the panic at all Still in playoffs still david price pitching Friday still great lineup still lots of games left #bluejays negative (p = 0.63) positive (p = 0.55)
👉Performance issues (3)
Vulnerability Level Data slice Metric Transformation Deviation
Performance major 🔴 text contains "like" Precision = 0.451 -12.79% than global
🔍✨Examples For records in the dataset where `text` contains "like", the Precision is 12.79% lower than the global Precision.
text label Predicted label
1 "National hot dog day, national tequila day, then national dance day... Sounds like a Friday night." positive neutral (p = 0.78)
21 @user I haven't been able to watch TVD live these days due to Football. Every Thurs there is high school FB going on at 8pm. Like WTF! negative neutral (p = 0.89)
26 can we get him to beat out the wanted atleast he's in like 15th please i love him he deserves it Ed Sheeran positive neutral (p = 0.79)
Vulnerability Level Data slice Metric Transformation Deviation
Performance major 🔴 text contains "night" Precision = 0.462 -10.81% than global
🔍✨Examples For records in the dataset where `text` contains "night", the Precision is 10.81% lower than the global Precision.
text label Predicted label
1 "National hot dog day, national tequila day, then national dance day... Sounds like a Friday night." positive neutral (p = 0.78)
42 A FB friend of mine just posted that seeing Magic Mike XXL was the best night of her life. If only she knew what my typical sat night is. positive neutral (p = 0.94)
69 @user Front row shot of David Wright on Wednesday night in St.Lucie. Keep up the excellent work, sir! neutral positive (p = 0.47)
Vulnerability Level Data slice Metric Transformation Deviation
Performance medium 🟡 text contains "day" Precision = 0.474 -8.32% than global
🔍✨Examples For records in the dataset where `text` contains "day", the Precision is 8.32% lower than the global Precision.
text label Predicted label
1 "National hot dog day, national tequila day, then national dance day... Sounds like a Friday night." positive neutral (p = 0.78)
58 "Tomorrow is National Ice Cream Day. Just in case you can't make it to the dining hall to satisfy your craving, here are some stores......" positive neutral (p = 0.95)
95 I hope u all have a good day bc it's Friday and Shawn loves u. Keep smilin' (: positive neutral (p = 0.97)

Checkout out the Giskard Space and test your model.

Disclaimer: it's important to note that automated scans may produce false positives or miss certain vulnerabilities. We encourage you to review the findings and assess the impact accordingly.

Sign up or log in to comment