Report for cardiffnlp/twitter-xlm-roberta-base-sentiment-multilingual

#124
by giskard-bot - opened
Giskard org

Hi Team,

This is a report from Giskard Bot Scan 🐢.

We have identified 7 potential vulnerabilities in your model based on an automated scan.

This automated analysis evaluated the model on the dataset cardiffnlp/tweet_sentiment_multilingual (subset english, split validation).

👉Robustness issues (5)

When feature “text” is perturbed with the transformation “Transform to uppercase”, the model changes its prediction in 24.07% of the cases. We expected the predictions not to be affected by this transformation.

Level Data slice Metric Deviation
major 🔴 Fail rate = 0.241 78/324 tested samples (24.07%) changed prediction after perturbation

Taxonomy

avid-effect:performance:P0201
🔍✨Examples
text Transform to uppercase(text) Original prediction Prediction after perturbation
2 Hold on... Sam Smith may do the theme to Spectre!? Dope!!!!!! #007 #SPECTRE #JamesBond HOLD ON... SAM SMITH MAY DO THE THEME TO SPECTRE!? DOPE!!!!!! #007 #SPECTRE #JAMESBOND positive (p = 0.98) neutral (p = 0.77)
4 Gonna watch Final Destination 5 tonight. I always leave the theater so afraid of everything. No huge escalators for sure :S GONNA WATCH FINAL DESTINATION 5 TONIGHT. I ALWAYS LEAVE THE THEATER SO AFRAID OF EVERYTHING. NO HUGE ESCALATORS FOR SURE :S positive (p = 0.96) negative (p = 0.72)
9 Disappointed the Knicks vs Nets game got canceled tonight\u002c but I\u2019m even more hyped for Knicks vs Heat on Friday! DISAPPOINTED THE KNICKS VS NETS GAME GOT CANCELED TONIGHT\U002C BUT I\U2019M EVEN MORE HYPED FOR KNICKS VS HEAT ON FRIDAY! negative (p = 0.47) positive (p = 0.97)

When feature “text” is perturbed with the transformation “Transform to title case”, the model changes its prediction in 18.52% of the cases. We expected the predictions not to be affected by this transformation.

Level Data slice Metric Deviation
major 🔴 Fail rate = 0.185 60/324 tested samples (18.52%) changed prediction after perturbation

Taxonomy

avid-effect:performance:P0201
🔍✨Examples
text Transform to title case(text) Original prediction Prediction after perturbation
0 @user @user I think after Charlie Hebdo the French did NOT react as the US did after 9/11. But they may do this time around. @User @User I Think After Charlie Hebdo The French Did Not React As The Us Did After 9/11. But They May Do This Time Around. negative (p = 0.50) neutral (p = 0.73)
1 "Interview with Devon Alexander """"Speed Kills"""" (VIDEO) On Tuesday Oct 16th we had the privilege of catch up with... "Interview With Devon Alexander """"Speed Kills"""" (Video) On Tuesday Oct 16Th We Had The Privilege Of Catch Up With... neutral (p = 0.67) positive (p = 0.91)
4 Gonna watch Final Destination 5 tonight. I always leave the theater so afraid of everything. No huge escalators for sure :S Gonna Watch Final Destination 5 Tonight. I Always Leave The Theater So Afraid Of Everything. No Huge Escalators For Sure :S positive (p = 0.96) negative (p = 0.39)

When feature “text” is perturbed with the transformation “Add typos”, the model changes its prediction in 14.74% of the cases. We expected the predictions not to be affected by this transformation.

Level Data slice Metric Deviation
major 🔴 Fail rate = 0.147 46/312 tested samples (14.74%) changed prediction after perturbation

Taxonomy

avid-effect:performance:P0201
🔍✨Examples
text Add typos(text) Original prediction Prediction after perturbation
1 "Interview with Devon Alexander """"Speed Kills"""" (VIDEO) On Tuesday Oct 16th we had the privilege of catch up with... "Interview with Devon Alexadner """"Speed Kils"""" (VIDSO) On Tuesdxay Oct 16th we had the privilege of catch up with... neutral (p = 0.67) positive (p = 0.76)
4 Gonna watch Final Destination 5 tonight. I always leave the theater so afraid of everything. No huge escalators for sure :S Gonna watvch Final Destination 5 tonihgt. U always leave rthe theater so afraid of everything. No huge escalators for sure :S positive (p = 0.96) negative (p = 0.54)
11 """""@_eryflores: March 16 Luke Bryan is gonna at the Houston Rodeo. I HAVE to go\u002c Its a MUST!""""" """""@_eryflores: March 16 Luke Bryzn is gonna at the Houtson Rodo. I HAVE to go\u002c Its a MUST!""""" positive (p = 0.76) neutral (p = 0.72)

When feature “text” is perturbed with the transformation “Punctuation Removal”, the model changes its prediction in 9.36% of the cases. We expected the predictions not to be affected by this transformation.

Level Data slice Metric Deviation
medium 🟡 Fail rate = 0.094 28/299 tested samples (9.36%) changed prediction after perturbation

Taxonomy

avid-effect:performance:P0201
🔍✨Examples
text Punctuation Removal(text) Original prediction Prediction after perturbation
1 "Interview with Devon Alexander """"Speed Kills"""" (VIDEO) On Tuesday Oct 16th we had the privilege of catch up with... Interview with Devon Alexander \Speed Kills\ (VIDEO) On Tuesday Oct 16th we had the privilege of catch up with neutral (p = 0.67) positive (p = 0.69)
2 Hold on... Sam Smith may do the theme to Spectre!? Dope!!!!!! #007 #SPECTRE #JamesBond Hold on Sam Smith may do the theme to Spectre Dope #007 #SPECTRE #JamesBond positive (p = 0.98) neutral (p = 0.93)
4 Gonna watch Final Destination 5 tonight. I always leave the theater so afraid of everything. No huge escalators for sure :S Gonna watch Final Destination 5 tonight I always leave the theater so afraid of everything No huge escalators for sure S positive (p = 0.96) negative (p = 0.81)

When feature “text” is perturbed with the transformation “Transform to lowercase”, the model changes its prediction in 6.92% of the cases. We expected the predictions not to be affected by this transformation.

Level Data slice Metric Deviation
medium 🟡 Fail rate = 0.069 22/318 tested samples (6.92%) changed prediction after perturbation

Taxonomy

avid-effect:performance:P0201
🔍✨Examples
text Transform to lowercase(text) Original prediction Prediction after perturbation
36 David Cameron's statement on camera on Thursday 03 September 2015: he will take in 'more' of the refugees: was he speaking TO TV Cameras? david cameron's statement on camera on thursday 03 september 2015: he will take in 'more' of the refugees: was he speaking to tv cameras? negative (p = 0.52) neutral (p = 0.68)
66 "George Lincoln Rockwell was one of the 1st to recognize that Conservatives like @user Buckley, Goldwater & Reagan were #Cucks for Israel." "george lincoln rockwell was one of the 1st to recognize that conservatives like @user buckley, goldwater & reagan were #cucks for israel." positive (p = 0.87) negative (p = 0.37)
69 Amazon Prime Day beats Black Friday says retailer Amazon Prime Day may have been an excuse for the retail... amazon prime day beats black friday says retailer amazon prime day may have been an excuse for the retail... negative (p = 0.64) neutral (p = 0.56)
👉Performance issues (1)

For records in the dataset where text contains "time", the Precision is 11.88% lower than the global Precision.

Level Data slice Metric Deviation
major 🔴 text contains "time" Precision = 0.650 -11.88% than global

Taxonomy

avid-effect:performance:P0204
🔍✨Examples
text label Predicted label
93 "Sir John dined from Justin Bieber was closed, burst into the same time--""There is too awful whisper,--""I may accelerate that" negative neutral (p = 0.79)
104 I might reread the Harry Potter books for like the 7th time positive neutral (p = 0.77)
109 Serena and Venus Williams Face Off at US Open: For the 27th time, the sisters played against each other 14 yea... neutral positive (p = 0.61)
👉Ethical issues (1)

When feature “text” is perturbed with the transformation “Switch Religion”, the model changes its prediction in 9.52% of the cases. We expected the predictions not to be affected by this transformation.

Level Data slice Metric Deviation
medium 🟡 Fail rate = 0.095 2/21 tested samples (9.52%) changed prediction after perturbation

Taxonomy

avid-effect:ethics:E0101 avid-effect:performance:P0201
🔍✨Examples
text Switch Religion(text) Original prediction Prediction after perturbation
97 Correction: Carson did not say Christians deserve more 1st Amendment protections than other religions. But what he did say was clear as mud. Correction: Carson did not say jews deserve more 1st Amendment protections than other religions. But what he did say was clear as mud. negative (p = 0.48) neutral (p = 0.52)
275 @user Prayers for all of you today. May God carry each one of you during this sad time ""Footprints in the Sand"", RIP Frank Gifford" @user Prayers for all of you today. May allah carry each one of you during this sad time ""Footprints in the Sand"", RIP Frank Gifford" positive (p = 0.36) negative (p = 0.42)

Checkout out the Giskard Space and test your model.

Disclaimer: it's important to note that automated scans may produce false positives or miss certain vulnerabilities. We encourage you to review the findings and assess the impact accordingly.

Sign up or log in to comment