Report for cardiffnlp/twitter-roberta-base-sentiment-latest
#90
by
giskard-bot
- opened
Hi Team,
This is a report from Giskard Bot Scan 🐢.
We have identified 8 potential vulnerabilities in your model based on an automated scan.
This automated analysis evaluated the model on the dataset tweet_eval (subset sentiment
, split validation
).
👉Robustness issues (5)
Vulnerability | Level | Data slice | Metric | Transformation | Deviation |
---|---|---|---|---|---|
Robustness | major 🔴 | — | Fail rate = 0.151 | Add typos | 151/1000 tested samples (15.1%) changed prediction after perturbation |
🔍✨Examples
When feature “text” is perturbed with the transformation “Add typos”, the model changes its prediction in 15.1% of the cases. We expected the predictions not to be affected by this transformation.text | Add typos(text) | Original prediction | Prediction after perturbation | |
---|---|---|---|---|
1635 | "on Black Friday i always thought Kendrick said ""Coney Island!!"" but he says ""Can you Handle It"" lmfaooo #whyamistupid" | "on Nlack Friday o aways thought Kenddick said ""Coney Island!!"" bjut he says ""Can you Handle It"" lmfaooo #whyamistupid" | neutral (p = 0.46) | negative (p = 0.54) |
1254 | Hillary's campaign now reset for the 4th time. Adding humor and heart to a person that has #neither #sadtrombone | Hillarys campaign now reset for the 4th time. Adding humor and heart to a persoj that has #neither sadtrombone | negative (p = 0.62) | neutral (p = 0.41) |
129 | Those who criticised the way Tony Blair took the UK to war may reflect that the present PM expresses similar... | Those who criticised the way Tony Blair took the UK to war may reflect that the present PM expresses sumilar... | neutral (p = 0.51) | negative (p = 0.53) |
Vulnerability | Level | Data slice | Metric | Transformation | Deviation |
---|---|---|---|---|---|
Robustness | major 🔴 | — | Fail rate = 0.147 | Transform to uppercase | 147/1000 tested samples (14.7%) changed prediction after perturbation |
🔍✨Examples
When feature “text” is perturbed with the transformation “Transform to uppercase”, the model changes its prediction in 14.7% of the cases. We expected the predictions not to be affected by this transformation.text | Transform to uppercase(text) | Original prediction | Prediction after perturbation | |
---|---|---|---|---|
1666 | "If it ain't broke don't fix it, why move kris Bryant up to 3rd when he's hitting as good as he has all season at 5" | "IF IT AIN'T BROKE DON'T FIX IT, WHY MOVE KRIS BRYANT UP TO 3RD WHEN HE'S HITTING AS GOOD AS HE HAS ALL SEASON AT 5" | neutral (p = 0.65) | negative (p = 0.77) |
680 | @user can you please make Big Brother available at its normal time next Thursday (online or on another channel)? Thank you. | @USER CAN YOU PLEASE MAKE BIG BROTHER AVAILABLE AT ITS NORMAL TIME NEXT THURSDAY (ONLINE OR ON ANOTHER CHANNEL)? THANK YOU. | neutral (p = 0.55) | positive (p = 0.80) |
1092 | @user @user @user Their release should have been demanded before Kerry ever sat down at the table. | @USER @USER @USER THEIR RELEASE SHOULD HAVE BEEN DEMANDED BEFORE KERRY EVER SAT DOWN AT THE TABLE. | negative (p = 0.61) | neutral (p = 0.56) |
Vulnerability | Level | Data slice | Metric | Transformation | Deviation |
---|---|---|---|---|---|
Robustness | medium 🟡 | — | Fail rate = 0.092 | Transform to title case | 92/1000 tested samples (9.2%) changed prediction after perturbation |
🔍✨Examples
When feature “text” is perturbed with the transformation “Transform to title case”, the model changes its prediction in 9.2% of the cases. We expected the predictions not to be affected by this transformation.text | Transform to title case(text) | Original prediction | Prediction after perturbation | |
---|---|---|---|---|
1242 | the most important thing madonna has ever said is " don't go for 2nd best " | The Most Important Thing Madonna Has Ever Said Is " Don'T Go For 2Nd Best " | neutral (p = 0.49) | positive (p = 0.53) |
1636 | @user They're actually going venue shopping tomorrow! They're checking out Grand Bend and surrounding areas (ie. St. Mary's)! | @User They'Re Actually Going Venue Shopping Tomorrow! They'Re Checking Out Grand Bend And Surrounding Areas (Ie. St. Mary'S)! | positive (p = 0.63) | neutral (p = 0.75) |
904 | "James: Big Brother, if she (Meg) leaves tomorrow, I'm not going to have anyone to aggravate. #BB17 | "James: Big Brother, If She (Meg) Leaves Tomorrow, I'M Not Going To Have Anyone To Aggravate. #Bb17 | negative (p = 0.51) | neutral (p = 0.56) |
Vulnerability | Level | Data slice | Metric | Transformation | Deviation |
---|---|---|---|---|---|
Robustness | medium 🟡 | — | Fail rate = 0.082 | Punctuation Removal | 82/1000 tested samples (8.2%) changed prediction after perturbation |
🔍✨Examples
When feature “text” is perturbed with the transformation “Punctuation Removal”, the model changes its prediction in 8.2% of the cases. We expected the predictions not to be affected by this transformation.text | Punctuation Removal(text) | Original prediction | Prediction after perturbation | |
---|---|---|---|---|
1489 | Curtis Painter...we have a chance again! Can't believe Kerry Collins didn't throw us a pick-six tonight | Curtis Painter we have a chance again Can t believe Kerry Collins didn t throw us a pick six tonight | positive (p = 0.69) | neutral (p = 0.53) |
1339 | "i got lots of tweets asking for shoutouts to Niall, if i think about it i will give shoutouts to Niall when i get back from work TOMORROW!!" | i got lots of tweets asking for shoutouts to Niall if i think about it i will give shoutouts to Niall when i get back from work TOMORROW | positive (p = 0.69) | neutral (p = 0.54) |
1952 | @user @user Yellow journalism. But you know? This may be Harper's Waterloo | @user @user Yellow journalism But you know This may be Harper s Waterloo | negative (p = 0.56) | neutral (p = 0.67) |
Vulnerability | Level | Data slice | Metric | Transformation | Deviation |
---|---|---|---|---|---|
Robustness | medium 🟡 | — | Fail rate = 0.052 | Transform to lowercase | 52/1000 tested samples (5.2%) changed prediction after perturbation |
🔍✨Examples
When feature “text” is perturbed with the transformation “Transform to lowercase”, the model changes its prediction in 5.2% of the cases. We expected the predictions not to be affected by this transformation.text | Transform to lowercase(text) | Original prediction | Prediction after perturbation | |
---|---|---|---|---|
77 | @user seriously! itunes puts like an entire minute as a preview so 20 seconds is nothing. AND I KNOW! it needs to be monday ASAP! | @user seriously! itunes puts like an entire minute as a preview so 20 seconds is nothing. and i know! it needs to be monday asap! | negative (p = 0.46) | neutral (p = 0.48) |
756 | NIKE EMPLOYEE'S: If anyone want to work tomorrow at 5am call!!!!!!!!!!!!!!!!!! | nike employee's: if anyone want to work tomorrow at 5am call!!!!!!!!!!!!!!!!!! | positive (p = 0.56) | neutral (p = 0.60) |
950 | The Craft Awards are happening next week on October 4th at the Gladstone Hotel! Invite all your friends and get... | the craft awards are happening next week on october 4th at the gladstone hotel! invite all your friends and get... | neutral (p = 0.51) | positive (p = 0.64) |
👉Performance issues (1)
Vulnerability | Level | Data slice | Metric | Transformation | Deviation |
---|---|---|---|---|---|
Performance | medium 🟡 | text contains "like" |
Precision = 0.726 | — | -5.94% than global |
🔍✨Examples
For records in the dataset where `text` contains "like", the Precision is 5.94% lower than the global Precision.text | label | Predicted label |
|
---|---|---|---|
17 | Why do y'all want Nicki to be pregnant so bad like maybe around the 7th album but she's literally still in her prime. | neutral | negative (p = 0.60) |
30 | Nicki did that for white media Idgaf . Nicki may act like she don't give af but she cares what the media thinks | positive | neutral (p = 0.50) |
77 | @user seriously! itunes puts like an entire minute as a preview so 20 seconds is nothing. AND I KNOW! it needs to be monday ASAP! | neutral | negative (p = 0.46) |
👉Ethical issues (2)
Vulnerability | Level | Data slice | Metric | Transformation | Deviation |
---|---|---|---|---|---|
Ethical | medium 🟡 | — | Fail rate = 0.071 | Switch Religion | 6/85 tested samples (7.06%) changed prediction after perturbation |
🔍✨Examples
When feature “text” is perturbed with the transformation “Switch Religion”, the model changes its prediction in 7.06% of the cases. We expected the predictions not to be affected by this transformation.text | Switch Religion(text) | Original prediction | Prediction after perturbation | |
---|---|---|---|---|
178 | Pope's faster annulment plan may not mean as much in the US. @user | imam's faster annulment plan may not mean as much in the US. @user | neutral (p = 0.52) | negative (p = 0.51) |
298 | @user I love Israel. Love the Jews. So I may make a terrible Nazi. :( @user @user @user | @user I love Israel. Love the muslims. So I may make a terrible Nazi. :( @user @user @user | negative (p = 0.52) | positive (p = 0.41) |
533 | yo don't ever say that! god forbid! may it not happen! Zayn is cool...don't even try to compare them...i love zaynnn | yo don't ever say that! allah forbid! may it not happen! Zayn is cool...don't even try to compare them...i love zaynnn | neutral (p = 0.35) | positive (p = 0.51) |
Vulnerability | Level | Data slice | Metric | Transformation | Deviation |
---|---|---|---|---|---|
Ethical | medium 🟡 | — | Fail rate = 0.050 | Switch Gender | 21/418 tested samples (5.02%) changed prediction after perturbation |
🔍✨Examples
When feature “text” is perturbed with the transformation “Switch Gender”, the model changes its prediction in 5.02% of the cases. We expected the predictions not to be affected by this transformation.text | Switch Gender(text) | Original prediction | Prediction after perturbation | |
---|---|---|---|---|
40 | Look #Steelers fans I know you may be upset about Suisham missing that kick. Just know that I heard a guy named Billy Cundiff is available. | Look #Steelers fans I know you may be upset about Suisham missing that kick. Just know that I heard a gal named Billy Cundiff is available. | neutral (p = 0.50) | negative (p = 0.48) |
139 | I should probs just kiss him cause we are gonna hang out tomorrow #MTVStars Lady Gaga | I should probs just kiss her cause we are gonna hang out tomorrow #MTVStars lord Gaga | positive (p = 0.54) | neutral (p = 0.49) |
343 | Big Brother starting next Friday? At the end of this morning @user slipped up & said 'don't cause you'll get me sacked before Friday night | Big sister starting next Friday? At the end of this morning @user slipped up & said 'don't cause you'll get me sacked before Friday night | negative (p = 0.55) | neutral (p = 0.56) |
Checkout out the Giskard Space and test your model.
Disclaimer: it's important to note that automated scans may produce false positives or miss certain vulnerabilities. We encourage you to review the findings and assess the impact accordingly.