Report for citizenlab/twitter-xlm-roberta-base-sentiment-finetunned

#88
by giskard-bot - opened

Hi Team,

This is a report from Giskard Bot Scan 🐢.

We have identified 7 potential vulnerabilities in your model based on an automated scan.

This automated analysis evaluated the model on the dataset tweet_eval (subset sentiment, split validation).

👉Robustness issues (4)
Vulnerability Level Data slice Metric Transformation Deviation
Robustness major 🔴 Fail rate = 0.117 Transform to uppercase 117/1000 tested samples (11.7%) changed prediction after perturbation
🔍✨Examples When feature “text” is perturbed with the transformation “Transform to uppercase”, the model changes its prediction in 11.7% of the cases. We expected the predictions not to be affected by this transformation.
text Transform to uppercase(text) Original prediction Prediction after perturbation
886 "Fake punt on 4th and 11? Wow, James Franklin can make some odd decisions. #PennState #Michigan #PSUvsMICH" "FAKE PUNT ON 4TH AND 11? WOW, JAMES FRANKLIN CAN MAKE SOME ODD DECISIONS. #PENNSTATE #MICHIGAN #PSUVSMICH" Negative (p = 0.65) Neutral (p = 0.97)
1554 I've been thinking about it... Does anyone else find it disturbing how Kane may face rape charges and GM's are calling on his availability? I'VE BEEN THINKING ABOUT IT... DOES ANYONE ELSE FIND IT DISTURBING HOW KANE MAY FACE RAPE CHARGES AND GM'S ARE CALLING ON HIS AVAILABILITY? Negative (p = 0.55) Neutral (p = 0.98)
219 Nebraska doesn't land Gesell...a Top 100 guy in your state and you don't get him. C'mon #Nebrasketball NEBRASKA DOESN'T LAND GESELL...A TOP 100 GUY IN YOUR STATE AND YOU DON'T GET HIM. C'MON #NEBRASKETBALL Negative (p = 0.75) Neutral (p = 0.65)
Vulnerability Level Data slice Metric Transformation Deviation
Robustness medium 🟡 Fail rate = 0.081 Punctuation Removal 81/1000 tested samples (8.1%) changed prediction after perturbation
🔍✨Examples When feature “text” is perturbed with the transformation “Punctuation Removal”, the model changes its prediction in 8.1% of the cases. We expected the predictions not to be affected by this transformation.
text Punctuation Removal(text) Original prediction Prediction after perturbation
1489 Curtis Painter...we have a chance again! Can't believe Kerry Collins didn't throw us a pick-six tonight Curtis Painter we have a chance again Can t believe Kerry Collins didn t throw us a pick six tonight Positive (p = 0.93) Neutral (p = 0.90)
1178 Since September 26 is Batman Day I'm having a Batman month. We start off with the greatest Batman Story ever #Batman Since September 26 is Batman Day I m having a Batman month We start off with the greatest Batman Story ever #Batman Positive (p = 0.66) Neutral (p = 0.56)
181 "Kapan sih lo ngebuktiin,jan ngomong doang Susah Susah.usaha Aja blm udh nyerah,inget.if you never try you'll never know.cowok kok gentle bgt" Kapan sih lo ngebuktiin jan ngomong doang Susah Susah usaha Aja blm udh nyerah inget if you never try you ll never know cowok kok gentle bgt Negative (p = 0.76) Neutral (p = 0.69)
Vulnerability Level Data slice Metric Transformation Deviation
Robustness medium 🟡 Fail rate = 0.076 Transform to title case 76/1000 tested samples (7.6%) changed prediction after perturbation
🔍✨Examples When feature “text” is perturbed with the transformation “Transform to title case”, the model changes its prediction in 7.6% of the cases. We expected the predictions not to be affected by this transformation.
text Transform to title case(text) Original prediction Prediction after perturbation
886 "Fake punt on 4th and 11? Wow, James Franklin can make some odd decisions. #PennState #Michigan #PSUvsMICH" "Fake Punt On 4Th And 11? Wow, James Franklin Can Make Some Odd Decisions. #Pennstate #Michigan #Psuvsmich" Negative (p = 0.65) Neutral (p = 0.79)
1554 I've been thinking about it... Does anyone else find it disturbing how Kane may face rape charges and GM's are calling on his availability? I'Ve Been Thinking About It... Does Anyone Else Find It Disturbing How Kane May Face Rape Charges And Gm'S Are Calling On His Availability? Negative (p = 0.55) Neutral (p = 0.95)
219 Nebraska doesn't land Gesell...a Top 100 guy in your state and you don't get him. C'mon #Nebrasketball Nebraska Doesn'T Land Gesell...A Top 100 Guy In Your State And You Don'T Get Him. C'Mon #Nebrasketball Negative (p = 0.75) Neutral (p = 0.89)
Vulnerability Level Data slice Metric Transformation Deviation
Robustness medium 🟡 Fail rate = 0.075 Add typos 75/1000 tested samples (7.5%) changed prediction after perturbation
🔍✨Examples When feature “text” is perturbed with the transformation “Add typos”, the model changes its prediction in 7.5% of the cases. We expected the predictions not to be affected by this transformation.
text Add typos(text) Original prediction Prediction after perturbation
194 @user @user @user that I may have an idea , you have written all Christians and God totally off not willingTOthink" @user @user @user that I may have an idea , you have written all Christians anf Go dtotally off no willingTOthink" Negative (p = 0.75) Neutral (p = 0.99)
320 It was a WILD night at @user Jazz at the Bistro. Amy Schumer & cast dug our 2nd set & had a comedy jam- Epic! It was a WILD night at @user Jazz at the Bistro. Amy Schumer & cast dug our 2nd set & had a comedy jam- pEic! Positive (p = 0.95) Negative (p = 0.89)
770 Milan's overthetop lipsynch was funny the 1st time but 2nd just seems like she's trying too hard #RuVealed #RuPaulsDragRace @user Milan's overthetop lipsynch qas funny the 1st time but 2nd just seems lie she'w trying too hard #RuVealed #RuPaulsDragRace @user Negative (p = 0.54) Neutral (p = 0.53)
👉Performance issues (3)
Vulnerability Level Data slice Metric Transformation Deviation
Performance major 🔴 text contains "day" Precision = 0.555 -10.74% than global
🔍✨Examples For records in the dataset where `text` contains "day", the Precision is 10.74% lower than the global Precision.
text label Predicted label
1 "National hot dog day, national tequila day, then national dance day... Sounds like a Friday night." Positive Neutral (p = 0.97)
58 "Tomorrow is National Ice Cream Day. Just in case you can't make it to the dining hall to satisfy your craving, here are some stores......" Positive Neutral (p = 0.99)
98 @user Dear Taimouraga, Thank you for contacting. Apologies for the late reply. Yes the Centers were open at the 4th day of Eid." Positive Neutral (p = 0.97)
Vulnerability Level Data slice Metric Transformation Deviation
Performance major 🔴 text contains "like" Precision = 0.558 -10.29% than global
🔍✨Examples For records in the dataset where `text` contains "like", the Precision is 10.29% lower than the global Precision.
text label Predicted label
1 "National hot dog day, national tequila day, then national dance day... Sounds like a Friday night." Positive Neutral (p = 0.97)
17 Why do y'all want Nicki to be pregnant so bad like maybe around the 7th album but she's literally still in her prime. Neutral Negative (p = 0.76)
30 Nicki did that for white media Idgaf . Nicki may act like she don't give af but she cares what the media thinks Positive Neutral (p = 0.96)
Vulnerability Level Data slice Metric Transformation Deviation
Performance medium 🟡 text contains "night" Precision = 0.590 -5.11% than global
🔍✨Examples For records in the dataset where `text` contains "night", the Precision is 5.11% lower than the global Precision.
text label Predicted label
1 "National hot dog day, national tequila day, then national dance day... Sounds like a Friday night." Positive Neutral (p = 0.97)
69 @user Front row shot of David Wright on Wednesday night in St.Lucie. Keep up the excellent work, sir! Neutral Positive (p = 0.97)
72 "We have four Premium Seats for the Zac Brown Band, for this Friday Night 8/7/15 at Fenway Park. These are... Positive Neutral (p = 0.99)

Checkout out the Giskard Space and test your model.

Disclaimer: it's important to note that automated scans may produce false positives or miss certain vulnerabilities. We encourage you to review the findings and assess the impact accordingly.

Sign up or log in to comment