Report for citizenlab/twitter-xlm-roberta-base-sentiment-finetunned
#88
by
giskard-bot
- opened
Hi Team,
This is a report from Giskard Bot Scan 🐢.
We have identified 7 potential vulnerabilities in your model based on an automated scan.
This automated analysis evaluated the model on the dataset tweet_eval (subset sentiment
, split validation
).
👉Robustness issues (4)
Vulnerability | Level | Data slice | Metric | Transformation | Deviation |
---|---|---|---|---|---|
Robustness | major 🔴 | — | Fail rate = 0.117 | Transform to uppercase | 117/1000 tested samples (11.7%) changed prediction after perturbation |
🔍✨Examples
When feature “text” is perturbed with the transformation “Transform to uppercase”, the model changes its prediction in 11.7% of the cases. We expected the predictions not to be affected by this transformation.text | Transform to uppercase(text) | Original prediction | Prediction after perturbation | |
---|---|---|---|---|
886 | "Fake punt on 4th and 11? Wow, James Franklin can make some odd decisions. #PennState #Michigan #PSUvsMICH" | "FAKE PUNT ON 4TH AND 11? WOW, JAMES FRANKLIN CAN MAKE SOME ODD DECISIONS. #PENNSTATE #MICHIGAN #PSUVSMICH" | Negative (p = 0.65) | Neutral (p = 0.97) |
1554 | I've been thinking about it... Does anyone else find it disturbing how Kane may face rape charges and GM's are calling on his availability? | I'VE BEEN THINKING ABOUT IT... DOES ANYONE ELSE FIND IT DISTURBING HOW KANE MAY FACE RAPE CHARGES AND GM'S ARE CALLING ON HIS AVAILABILITY? | Negative (p = 0.55) | Neutral (p = 0.98) |
219 | Nebraska doesn't land Gesell...a Top 100 guy in your state and you don't get him. C'mon #Nebrasketball | NEBRASKA DOESN'T LAND GESELL...A TOP 100 GUY IN YOUR STATE AND YOU DON'T GET HIM. C'MON #NEBRASKETBALL | Negative (p = 0.75) | Neutral (p = 0.65) |
Vulnerability | Level | Data slice | Metric | Transformation | Deviation |
---|---|---|---|---|---|
Robustness | medium 🟡 | — | Fail rate = 0.081 | Punctuation Removal | 81/1000 tested samples (8.1%) changed prediction after perturbation |
🔍✨Examples
When feature “text” is perturbed with the transformation “Punctuation Removal”, the model changes its prediction in 8.1% of the cases. We expected the predictions not to be affected by this transformation.text | Punctuation Removal(text) | Original prediction | Prediction after perturbation | |
---|---|---|---|---|
1489 | Curtis Painter...we have a chance again! Can't believe Kerry Collins didn't throw us a pick-six tonight | Curtis Painter we have a chance again Can t believe Kerry Collins didn t throw us a pick six tonight | Positive (p = 0.93) | Neutral (p = 0.90) |
1178 | Since September 26 is Batman Day I'm having a Batman month. We start off with the greatest Batman Story ever #Batman | Since September 26 is Batman Day I m having a Batman month We start off with the greatest Batman Story ever #Batman | Positive (p = 0.66) | Neutral (p = 0.56) |
181 | "Kapan sih lo ngebuktiin,jan ngomong doang Susah Susah.usaha Aja blm udh nyerah,inget.if you never try you'll never know.cowok kok gentle bgt" | Kapan sih lo ngebuktiin jan ngomong doang Susah Susah usaha Aja blm udh nyerah inget if you never try you ll never know cowok kok gentle bgt | Negative (p = 0.76) | Neutral (p = 0.69) |
Vulnerability | Level | Data slice | Metric | Transformation | Deviation |
---|---|---|---|---|---|
Robustness | medium 🟡 | — | Fail rate = 0.076 | Transform to title case | 76/1000 tested samples (7.6%) changed prediction after perturbation |
🔍✨Examples
When feature “text” is perturbed with the transformation “Transform to title case”, the model changes its prediction in 7.6% of the cases. We expected the predictions not to be affected by this transformation.text | Transform to title case(text) | Original prediction | Prediction after perturbation | |
---|---|---|---|---|
886 | "Fake punt on 4th and 11? Wow, James Franklin can make some odd decisions. #PennState #Michigan #PSUvsMICH" | "Fake Punt On 4Th And 11? Wow, James Franklin Can Make Some Odd Decisions. #Pennstate #Michigan #Psuvsmich" | Negative (p = 0.65) | Neutral (p = 0.79) |
1554 | I've been thinking about it... Does anyone else find it disturbing how Kane may face rape charges and GM's are calling on his availability? | I'Ve Been Thinking About It... Does Anyone Else Find It Disturbing How Kane May Face Rape Charges And Gm'S Are Calling On His Availability? | Negative (p = 0.55) | Neutral (p = 0.95) |
219 | Nebraska doesn't land Gesell...a Top 100 guy in your state and you don't get him. C'mon #Nebrasketball | Nebraska Doesn'T Land Gesell...A Top 100 Guy In Your State And You Don'T Get Him. C'Mon #Nebrasketball | Negative (p = 0.75) | Neutral (p = 0.89) |
Vulnerability | Level | Data slice | Metric | Transformation | Deviation |
---|---|---|---|---|---|
Robustness | medium 🟡 | — | Fail rate = 0.075 | Add typos | 75/1000 tested samples (7.5%) changed prediction after perturbation |
🔍✨Examples
When feature “text” is perturbed with the transformation “Add typos”, the model changes its prediction in 7.5% of the cases. We expected the predictions not to be affected by this transformation.text | Add typos(text) | Original prediction | Prediction after perturbation | |
---|---|---|---|---|
194 | @user @user @user that I may have an idea , you have written all Christians and God totally off not willingTOthink" | @user @user @user that I may have an idea , you have written all Christians anf Go dtotally off no willingTOthink" | Negative (p = 0.75) | Neutral (p = 0.99) |
320 | It was a WILD night at @user Jazz at the Bistro. Amy Schumer & cast dug our 2nd set & had a comedy jam- Epic! | It was a WILD night at @user Jazz at the Bistro. Amy Schumer & cast dug our 2nd set & had a comedy jam- pEic! | Positive (p = 0.95) | Negative (p = 0.89) |
770 | Milan's overthetop lipsynch was funny the 1st time but 2nd just seems like she's trying too hard #RuVealed #RuPaulsDragRace @user | Milan's overthetop lipsynch qas funny the 1st time but 2nd just seems lie she'w trying too hard #RuVealed #RuPaulsDragRace @user | Negative (p = 0.54) | Neutral (p = 0.53) |
👉Performance issues (3)
Vulnerability | Level | Data slice | Metric | Transformation | Deviation |
---|---|---|---|---|---|
Performance | major 🔴 | text contains "day" |
Precision = 0.555 | — | -10.74% than global |
🔍✨Examples
For records in the dataset where `text` contains "day", the Precision is 10.74% lower than the global Precision.text | label | Predicted label |
|
---|---|---|---|
1 | "National hot dog day, national tequila day, then national dance day... Sounds like a Friday night." | Positive | Neutral (p = 0.97) |
58 | "Tomorrow is National Ice Cream Day. Just in case you can't make it to the dining hall to satisfy your craving, here are some stores......" | Positive | Neutral (p = 0.99) |
98 | @user Dear Taimouraga, Thank you for contacting. Apologies for the late reply. Yes the Centers were open at the 4th day of Eid." | Positive | Neutral (p = 0.97) |
Vulnerability | Level | Data slice | Metric | Transformation | Deviation |
---|---|---|---|---|---|
Performance | major 🔴 | text contains "like" |
Precision = 0.558 | — | -10.29% than global |
🔍✨Examples
For records in the dataset where `text` contains "like", the Precision is 10.29% lower than the global Precision.text | label | Predicted label |
|
---|---|---|---|
1 | "National hot dog day, national tequila day, then national dance day... Sounds like a Friday night." | Positive | Neutral (p = 0.97) |
17 | Why do y'all want Nicki to be pregnant so bad like maybe around the 7th album but she's literally still in her prime. | Neutral | Negative (p = 0.76) |
30 | Nicki did that for white media Idgaf . Nicki may act like she don't give af but she cares what the media thinks | Positive | Neutral (p = 0.96) |
Vulnerability | Level | Data slice | Metric | Transformation | Deviation |
---|---|---|---|---|---|
Performance | medium 🟡 | text contains "night" |
Precision = 0.590 | — | -5.11% than global |
🔍✨Examples
For records in the dataset where `text` contains "night", the Precision is 5.11% lower than the global Precision.text | label | Predicted label |
|
---|---|---|---|
1 | "National hot dog day, national tequila day, then national dance day... Sounds like a Friday night." | Positive | Neutral (p = 0.97) |
69 | @user Front row shot of David Wright on Wednesday night in St.Lucie. Keep up the excellent work, sir! | Neutral | Positive (p = 0.97) |
72 | "We have four Premium Seats for the Zac Brown Band, for this Friday Night 8/7/15 at Fenway Park. These are... | Positive | Neutral (p = 0.99) |
Checkout out the Giskard Space and test your model.
Disclaimer: it's important to note that automated scans may produce false positives or miss certain vulnerabilities. We encourage you to review the findings and assess the impact accordingly.