Report for citizenlab/twitter-xlm-roberta-base-sentiment-finetunned

#107
by giskard-bot - opened
Giskard org

Hi Team,

This is a report from Giskard Bot Scan 🐢.

We have identified 6 potential vulnerabilities in your model based on an automated scan.

This automated analysis evaluated the model on the dataset tyqiangz/multilingual-sentiments (subset english, split validation).

👉Performance issues (2)

For records in the dataset where text contains "time", the Precision is 40.94% lower than the global Precision.

Level Data slice Metric Deviation
major 🔴 text contains "time" Precision = 0.350 -40.94% than global

Taxonomy

avid-effect:performance:P0204
🔍✨Examples
text label Predicted label
0 @user @user I think after Charlie Hebdo the French did NOT react as the US did after 9/11. But they may do this time around. Negative Neutral (p = 0.97)
35 "According to Janet Jackson's long time producer Terry Lewis, the album is due in October. STAY CONNECTED!... Positive Neutral (p = 0.98)
65 Jay-Z sat in that Interview like a God showing that he was truly ahead of his time while the other niggas flirting with Foxy Brown Positive Neutral (p = 0.96)

For records in the dataset where text contains "tomorrow", the Precision is 8.22% lower than the global Precision.

Level Data slice Metric Deviation
medium 🟡 text contains "tomorrow" Precision = 0.544 -8.22% than global

Taxonomy

avid-effect:performance:P0204
🔍✨Examples
text label Predicted label
62 But it's a three day weekend and we see Ed Sheeran tomorrow (!!!!!) so things miiiight be looking up. Positive Neutral (p = 0.99)
68 When I wake up tomorrow I'll be in a different country. Whoa! I didn't run into a David Beckham at the airport. That's a bummer. Positive Negative (p = 0.96)
71 CINCH YOUR SADDLE is live on Amazon! Only 99 cents until tomorrow evening.Thank you gift! Positive Neutral (p = 0.87)
👉Robustness issues (4)

When feature “text” is perturbed with the transformation “Transform to uppercase”, the model changes its prediction in 15.43% of the cases. We expected the predictions not to be affected by this transformation.

Level Data slice Metric Deviation
major 🔴 Fail rate = 0.154 50/324 tested samples (15.43%) changed prediction after perturbation

Taxonomy

avid-effect:performance:P0201
🔍✨Examples
text Transform to uppercase(text) Original prediction Prediction after perturbation
2 Hold on... Sam Smith may do the theme to Spectre!? Dope!!!!!! #007 #SPECTRE #JamesBond HOLD ON... SAM SMITH MAY DO THE THEME TO SPECTRE!? DOPE!!!!!! #007 #SPECTRE #JAMESBOND Positive (p = 0.98) Neutral (p = 0.99)
4 Gonna watch Final Destination 5 tonight. I always leave the theater so afraid of everything. No huge escalators for sure :S GONNA WATCH FINAL DESTINATION 5 TONIGHT. I ALWAYS LEAVE THE THEATER SO AFRAID OF EVERYTHING. NO HUGE ESCALATORS FOR SURE :S Neutral (p = 0.81) Negative (p = 0.68)
6 @user @user Islam is an Abrahamic faith, Andrew. It may make you feel a little uneasy but it's the same God you worship. Sorry." @USER @USER ISLAM IS AN ABRAHAMIC FAITH, ANDREW. IT MAY MAKE YOU FEEL A LITTLE UNEASY BUT IT'S THE SAME GOD YOU WORSHIP. SORRY." Neutral (p = 0.96) Negative (p = 0.85)

When feature “text” is perturbed with the transformation “Add typos”, the model changes its prediction in 10.26% of the cases. We expected the predictions not to be affected by this transformation.

Level Data slice Metric Deviation
major 🔴 Fail rate = 0.103 32/312 tested samples (10.26%) changed prediction after perturbation

Taxonomy

avid-effect:performance:P0201
🔍✨Examples
text Add typos(text) Original prediction Prediction after perturbation
7 Harper's Worst Offense against Refugees may be Climate Record as rising temperatures add to chaos in the Middle East Harper's Worst Offense against Refugees mzy be Climate Recor sas rising temperatures ad to chaos in the Middle East Negative (p = 0.63) Neutral (p = 0.50)
20 Sharknado 3 may be the best film I've seen yet. #Sharknado3 #America Sharknado 3 may be the bext film I've seen yet. #Sharknado3 #America Positive (p = 0.98) Neutral (p = 0.98)
21 Celebrity Big Brother: Daniel's eviction stirs up bad feelings in the house: Daniel Baldwin may have left the ... Celebrity Buig Brother: Daniel's viction stirs yup bad felinhgs int he house: Daniel Baldwin may have left tnhe ... Negative (p = 0.80) Neutral (p = 0.99)

When feature “text” is perturbed with the transformation “Transform to title case”, the model changes its prediction in 8.64% of the cases. We expected the predictions not to be affected by this transformation.

Level Data slice Metric Deviation
medium 🟡 Fail rate = 0.086 28/324 tested samples (8.64%) changed prediction after perturbation

Taxonomy

avid-effect:performance:P0201
🔍✨Examples
text Transform to title case(text) Original prediction Prediction after perturbation
4 Gonna watch Final Destination 5 tonight. I always leave the theater so afraid of everything. No huge escalators for sure :S Gonna Watch Final Destination 5 Tonight. I Always Leave The Theater So Afraid Of Everything. No Huge Escalators For Sure :S Neutral (p = 0.81) Negative (p = 0.61)
15 "More like boring eagles""""""""@Tunnyking: C'mon bro, Go out and support the Super Eagles #RT @user I hate international breaks" "More Like Boring Eagles""""""""@Tunnyking: C'Mon Bro, Go Out And Support The Super Eagles #Rt @User I Hate International Breaks" Negative (p = 0.84) Neutral (p = 0.59)
21 Celebrity Big Brother: Daniel's eviction stirs up bad feelings in the house: Daniel Baldwin may have left the ... Celebrity Big Brother: Daniel'S Eviction Stirs Up Bad Feelings In The House: Daniel Baldwin May Have Left The ... Negative (p = 0.80) Neutral (p = 0.73)

When feature “text” is perturbed with the transformation “Punctuation Removal”, the model changes its prediction in 7.69% of the cases. We expected the predictions not to be affected by this transformation.

Level Data slice Metric Deviation
medium 🟡 Fail rate = 0.077 23/299 tested samples (7.69%) changed prediction after perturbation

Taxonomy

avid-effect:performance:P0201
🔍✨Examples
text Punctuation Removal(text) Original prediction Prediction after perturbation
2 Hold on... Sam Smith may do the theme to Spectre!? Dope!!!!!! #007 #SPECTRE #JamesBond Hold on Sam Smith may do the theme to Spectre Dope #007 #SPECTRE #JamesBond Positive (p = 0.98) Neutral (p = 0.99)
7 Harper's Worst Offense against Refugees may be Climate Record as rising temperatures add to chaos in the Middle East Harper s Worst Offense against Refugees may be Climate Record as rising temperatures add to chaos in the Middle East Negative (p = 0.63) Neutral (p = 0.51)
26 "this adorable old couple in dunkin literally made my day, he's turning 89 tomorrow and talked to me about how he was drafted for the WWII" this adorable old couple in dunkin literally made my day he s turning 89 tomorrow and talked to me about how he was drafted for the WWII Positive (p = 0.58) Neutral (p = 0.69)

Checkout out the Giskard Space and test your model.

Disclaimer: it's important to note that automated scans may produce false positives or miss certain vulnerabilities. We encourage you to review the findings and assess the impact accordingly.

Sign up or log in to comment