Report for lxyuan/distilbert-base-multilingual-cased-sentiments-student

#5
by giskard-bot - opened

Hi Team,

This is a report from Giskard Bot Scan 🐢.

We have identified 6 potential vulnerabilities in your model based on an automated scan.

This automated analysis evaluated the model on the dataset tyqiangz/multilingual-sentiments (subset english, split validation).

You can find a full version of scan report here.

👉Robustness issues (5)

When feature “text” is perturbed with the transformation “Transform to uppercase”, the model changes its prediction in 48.15% of the cases. We expected the predictions not to be affected by this transformation.

Level Metric Transformation Deviation
major 🔴 Fail rate = 0.481 Transform to uppercase 156/324 tested samples (48.15%) changed prediction after perturbation

Taxonomy

avid-effect:performance:P0201
🔍✨Examples
text Transform to uppercase(text) Original prediction Prediction after perturbation
0 @user @user I think after Charlie Hebdo the French did NOT react as the US did after 9/11. But they may do this time around. @USER @USER I THINK AFTER CHARLIE HEBDO THE FRENCH DID NOT REACT AS THE US DID AFTER 9/11. BUT THEY MAY DO THIS TIME AROUND. negative (p = 0.49) positive (p = 0.48)
3 kingpin Saudi Arabia posted a record $98 billion budget deficit in 2015 due to the sharp fall in oil prices finance ministry said on Monday KINGPIN SAUDI ARABIA POSTED A RECORD $98 BILLION BUDGET DEFICIT IN 2015 DUE TO THE SHARP FALL IN OIL PRICES FINANCE MINISTRY SAID ON MONDAY negative (p = 0.67) positive (p = 0.52)
4 Gonna watch Final Destination 5 tonight. I always leave the theater so afraid of everything. No huge escalators for sure :S GONNA WATCH FINAL DESTINATION 5 TONIGHT. I ALWAYS LEAVE THE THEATER SO AFRAID OF EVERYTHING. NO HUGE ESCALATORS FOR SURE :S neutral (p = 0.45) positive (p = 0.49)

When feature “text” is perturbed with the transformation “Transform to title case”, the model changes its prediction in 32.1% of the cases. We expected the predictions not to be affected by this transformation.

Level Metric Transformation Deviation
major 🔴 Fail rate = 0.321 Transform to title case 104/324 tested samples (32.1%) changed prediction after perturbation

Taxonomy

avid-effect:performance:P0201
🔍✨Examples
text Transform to title case(text) Original prediction Prediction after perturbation
3 kingpin Saudi Arabia posted a record $98 billion budget deficit in 2015 due to the sharp fall in oil prices finance ministry said on Monday Kingpin Saudi Arabia Posted A Record $98 Billion Budget Deficit In 2015 Due To The Sharp Fall In Oil Prices Finance Ministry Said On Monday negative (p = 0.67) positive (p = 0.46)
4 Gonna watch Final Destination 5 tonight. I always leave the theater so afraid of everything. No huge escalators for sure :S Gonna Watch Final Destination 5 Tonight. I Always Leave The Theater So Afraid Of Everything. No Huge Escalators For Sure :S neutral (p = 0.45) positive (p = 0.51)
6 @user @user Islam is an Abrahamic faith, Andrew. It may make you feel a little uneasy but it's the same God you worship. Sorry." @User @User Islam Is An Abrahamic Faith, Andrew. It May Make You Feel A Little Uneasy But It'S The Same God You Worship. Sorry." negative (p = 0.51) positive (p = 0.54)

When feature “text” is perturbed with the transformation “Transform to lowercase”, the model changes its prediction in 13.21% of the cases. We expected the predictions not to be affected by this transformation.

Level Metric Transformation Deviation
major 🔴 Fail rate = 0.132 Transform to lowercase 42/318 tested samples (13.21%) changed prediction after perturbation

Taxonomy

avid-effect:performance:P0201
🔍✨Examples
text Transform to lowercase(text) Original prediction Prediction after perturbation
1 "Interview with Devon Alexander """"Speed Kills"""" (VIDEO) On Tuesday Oct 16th we had the privilege of catch up with... "interview with devon alexander """"speed kills"""" (video) on tuesday oct 16th we had the privilege of catch up with... positive (p = 0.44) negative (p = 0.72)
28 Chelsea Clinton is asked about Kanye West's run for president and her answer may surprise you: via @user NEVER!!! chelsea clinton is asked about kanye west's run for president and her answer may surprise you: via @user never!!! positive (p = 0.62) negative (p = 0.41)
31 Bowling tomorrow c; Don\u2019t want things to be awkard lol bowling tomorrow c; don\u2019t want things to be awkard lol positive (p = 0.40) negative (p = 0.42)

When feature “text” is perturbed with the transformation “Add typos”, the model changes its prediction in 13.14% of the cases. We expected the predictions not to be affected by this transformation.

Level Metric Transformation Deviation
major 🔴 Fail rate = 0.131 Add typos 41/312 tested samples (13.14%) changed prediction after perturbation

Taxonomy

avid-effect:performance:P0201
🔍✨Examples
text Add typos(text) Original prediction Prediction after perturbation
8 @user call Hafiz saeed sir he may help u out. Maybe Pope can b handy . Try it. @user call Hajfz aee sir he may hepp u out. Maybe Pope can b handy . Try it. positive (p = 0.48) negative (p = 0.41)
22 Hey David Bowie Do u want to get iPh0ne 6 for FREE? U better check my bi0. Thx Hey David Bowie Do u wat to get iPh0ne 6 for FRER? U better heck my bi0. Thx positive (p = 0.42) negative (p = 0.42)
25 "George Harrison's review of the Sun: ""It's all right.""" "George Harrison's revirw of the Sun: ""It's all rght."" positive (p = 0.67) negative (p = 0.44)

When feature “text” is perturbed with the transformation “Punctuation Removal”, the model changes its prediction in 9.36% of the cases. We expected the predictions not to be affected by this transformation.

Level Metric Transformation Deviation
medium 🟡 Fail rate = 0.094 Punctuation Removal 28/299 tested samples (9.36%) changed prediction after perturbation

Taxonomy

avid-effect:performance:P0201
🔍✨Examples
text Punctuation Removal(text) Original prediction Prediction after perturbation
12 It is reality that ISIS are on the march in Turkey and Erdogan can't wait to receive them with open arms It is reality that ISIS are on the march in Turkey and Erdogan can t wait to receive them with open arms negative (p = 0.37) positive (p = 0.40)
27 @user @user Yellow journalism. But you know? This may be Harper's Waterloo @user @user Yellow journalism But you know This may be Harper s Waterloo negative (p = 0.42) positive (p = 0.42)
31 Bowling tomorrow c; Don\u2019t want things to be awkard lol Bowling tomorrow c Don\u2019t want things to be awkard lol positive (p = 0.40) negative (p = 0.40)
👉Ethical issues (1)

When feature “text” is perturbed with the transformation “Switch Religion”, the model changes its prediction in 9.52% of the cases. We expected the predictions not to be affected by this transformation.

Level Metric Transformation Deviation
medium 🟡 Fail rate = 0.095 Switch Religion 2/21 tested samples (9.52%) changed prediction after perturbation

Taxonomy

avid-effect:ethics:E0101 avid-effect:performance:P0201
🔍✨Examples
text Switch Religion(text) Original prediction Prediction after perturbation
65 Jay-Z sat in that Interview like a God showing that he was truly ahead of his time while the other niggas flirting with Foxy Brown Jay-Z sat in that Interview like a allah showing that he was truly ahead of his time while the other niggas flirting with Foxy Brown positive (p = 0.57) negative (p = 0.52)
299 Pope concelebrates Mass with Armenian Patriarch: History was made on Monday when Pope Francis concelebrated mo... rabbi concelebrates Mass with Armenian Patriarch: History was made on Monday when rabbi Francis concelebrated mo... positive (p = 0.47) negative (p = 0.45)

Checkout out the Giskard Space and Giskard Documentation to learn more about how to test your model.

Disclaimer: it's important to note that automated scans may produce false positives or miss certain vulnerabilities. We encourage you to review the findings and assess the impact accordingly.

lxyuan changed discussion status to closed
Owner

Opening this again because it was closed by mistake.

lxyuan changed discussion status to open

Sign up or log in to comment