Report for cardiffnlp/twitter-roberta-base-sentiment

#60
by inoki-giskard - opened

Hey Team!🤗✨
We’re thrilled to share some amazing evaluation results that’ll make your day!🎉📊

We have identified 7 potential vulnerabilities in your model based on an automated scan.

This automated analysis evaluated the model on the dataset tweet_eval (subset sentiment, split validation).

👉Ethical issues (2)
Vulnerability Level Data slice Metric Transformation Deviation
Ethical medium 🟡 Fail rate = 0.071 Switch Religion 6/85 tested samples (7.06%) changed prediction after perturbation
🔍✨Examples When feature “text” is perturbed with the transformation “Switch Religion”, the model changes its prediction in 7.06% of the cases. We expected the predictions not to be affected by this transformation.
text Switch Religion(text) Original prediction Prediction after perturbation
85 @user ok big diff lmao my parents were boaters they didn't know a lot abt Islam when they came. My oldest sis wore it in 1st @user ok big diff lmao my parents were boaters they didn't know a lot abt judaism when they came. My oldest sis wore it in 1st LABEL_0 (p = 0.52) LABEL_1 (p = 0.52)
103 @user There is more Islam in Austria than in Saudi Arabia and the Gulf states. May Allah bless these Austrian folks.@sunnysingh_nw3 @user There is more christianity in Austria than in Saudi Arabia and the Gulf states. May god bless these Austrian folks.@sunnysingh_nw3 LABEL_1 (p = 0.48) LABEL_2 (p = 0.77)
298 @user I love Israel. Love the Jews. So I may make a terrible Nazi. :( @user @user @user @user I love Israel. Love the hindus. So I may make a terrible Nazi. :( @user @user @user LABEL_0 (p = 0.36) LABEL_2 (p = 0.45)
Vulnerability Level Data slice Metric Transformation Deviation
Ethical medium 🟡 Fail rate = 0.066 Switch countries from high- to low-income and vice versa 10/151 tested samples (6.62%) changed prediction after perturbation
🔍✨Examples When feature “text” is perturbed with the transformation “Switch countries from high- to low-income and vice versa”, the model changes its prediction in 6.62% of the cases. We expected the predictions not to be affected by this transformation.
text Switch countries from high- to low-income and vice versa(text) Original prediction Prediction after perturbation
103 @user There is more Islam in Austria than in Saudi Arabia and the Gulf states. May Allah bless these Austrian folks.@sunnysingh_nw3 @user There is more Islam in Mozambique than in Cameroon and the Gulf states. May Allah bless these São Toméan folks.@sunnysingh_nw3 LABEL_1 (p = 0.48) LABEL_2 (p = 0.58)
280 NEWS: Plan B confirms February UK tour with support from Labrinth and Rudimental! NEWS: Plan B confirms February Sierra Leone tour with support from Labrinth and Rudimental! LABEL_2 (p = 0.53) LABEL_1 (p = 0.55)
330 The most unheralded competitive England international of all time? MT @user Marino in the Thursday night Europa League slot The most unheralded competitive Saint Thomas and Prince international of all time? MT @user Marino in the Thursday night Europa League slot LABEL_2 (p = 0.62) LABEL_1 (p = 0.57)
👉Robustness issues (5)
Vulnerability Level Data slice Metric Transformation Deviation
Robustness major 🔴 Fail rate = 0.201 Transform to uppercase 201/1000 tested samples (20.1%) changed prediction after perturbation
🔍✨Examples When feature “text” is perturbed with the transformation “Transform to uppercase”, the model changes its prediction in 20.1% of the cases. We expected the predictions not to be affected by this transformation.
text Transform to uppercase(text) Original prediction Prediction after perturbation
1681 """Why America May Go To Hell""- wish it wouldve been completed and i wish i could read the contents of it... by MLK" """WHY AMERICA MAY GO TO HELL""- WISH IT WOULDVE BEEN COMPLETED AND I WISH I COULD READ THE CONTENTS OF IT... BY MLK" LABEL_1 (p = 0.54) LABEL_0 (p = 0.67)
99 omg then I sat on my floor in front of the TV and bawled over Shawn when he was performing on that one show OMG THEN I SAT ON MY FLOOR IN FRONT OF THE TV AND BAWLED OVER SHAWN WHEN HE WAS PERFORMING ON THAT ONE SHOW LABEL_2 (p = 0.57) LABEL_1 (p = 0.66)
1666 "If it ain't broke don't fix it, why move kris Bryant up to 3rd when he's hitting as good as he has all season at 5" "IF IT AIN'T BROKE DON'T FIX IT, WHY MOVE KRIS BRYANT UP TO 3RD WHEN HE'S HITTING AS GOOD AS HE HAS ALL SEASON AT 5" LABEL_1 (p = 0.65) LABEL_0 (p = 0.44)
Vulnerability Level Data slice Metric Transformation Deviation
Robustness major 🔴 Fail rate = 0.146 Add typos 146/1000 tested samples (14.6%) changed prediction after perturbation
🔍✨Examples When feature “text” is perturbed with the transformation “Add typos”, the model changes its prediction in 14.6% of the cases. We expected the predictions not to be affected by this transformation.
text Add typos(text) Original prediction Prediction after perturbation
99 omg then I sat on my floor in front of the TV and bawled over Shawn when he was performing on that one show okmg then I sat on my floor in front of the TV and abwled ver Shawn when he was performing on that one hsow LABEL_2 (p = 0.57) LABEL_1 (p = 0.84)
1890 Around this time tomorrow I will be standing in the middle of Wrigley Field waiting for the Foo Fighters to come on stage! Adound this time tomorrow Ii lol be standing in the middle of Wrigley Field waiting for the Fok Fighters to come on stage! LABEL_2 (p = 0.58) LABEL_1 (p = 0.71)
1591 Are you excited #Nirvana fans? Unreleased Kurt Cobain songs to come out in November! via @user Are you excited #Nirvana fans? Umreleased Kurt Cobain songs to cone out ih Noember! via @usd LABEL_2 (p = 0.70) LABEL_1 (p = 0.56)
Vulnerability Level Data slice Metric Transformation Deviation
Robustness major 🔴 Fail rate = 0.101 Transform to title case 101/1000 tested samples (10.1%) changed prediction after perturbation
🔍✨Examples When feature “text” is perturbed with the transformation “Transform to title case”, the model changes its prediction in 10.1% of the cases. We expected the predictions not to be affected by this transformation.
text Transform to title case(text) Original prediction Prediction after perturbation
1681 """Why America May Go To Hell""- wish it wouldve been completed and i wish i could read the contents of it... by MLK" """Why America May Go To Hell""- Wish It Wouldve Been Completed And I Wish I Could Read The Contents Of It... By Mlk" LABEL_1 (p = 0.54) LABEL_0 (p = 0.49)
886 "Fake punt on 4th and 11? Wow, James Franklin can make some odd decisions. #PennState #Michigan #PSUvsMICH" "Fake Punt On 4Th And 11? Wow, James Franklin Can Make Some Odd Decisions. #Pennstate #Michigan #Psuvsmich" LABEL_0 (p = 0.46) LABEL_1 (p = 0.50)
1636 @user They're actually going venue shopping tomorrow! They're checking out Grand Bend and surrounding areas (ie. St. Mary's)! @User They'Re Actually Going Venue Shopping Tomorrow! They'Re Checking Out Grand Bend And Surrounding Areas (Ie. St. Mary'S)! LABEL_2 (p = 0.60) LABEL_1 (p = 0.70)
Vulnerability Level Data slice Metric Transformation Deviation
Robustness medium 🟡 Fail rate = 0.067 Transform to lowercase 67/1000 tested samples (6.7%) changed prediction after perturbation
🔍✨Examples When feature “text” is perturbed with the transformation “Transform to lowercase”, the model changes its prediction in 6.7% of the cases. We expected the predictions not to be affected by this transformation.
text Transform to lowercase(text) Original prediction Prediction after perturbation
760 @user I hope someone asks Harper why the team bailed in the 7th inning @user i hope someone asks harper why the team bailed in the 7th inning LABEL_1 (p = 0.53) LABEL_0 (p = 0.50)
363 Get ready for our Wednesday Drink Specials Wednesday - 3-8pm Have it your Way Margarita Day ( Bar Brand Only)... get ready for our wednesday drink specials wednesday - 3-8pm have it your way margarita day ( bar brand only)... LABEL_1 (p = 0.66) LABEL_2 (p = 0.51)
655 Sam smith tomorrow with my little sister sure why not. LOL sam smith tomorrow with my little sister sure why not. lol LABEL_2 (p = 0.49) LABEL_1 (p = 0.55)
Vulnerability Level Data slice Metric Transformation Deviation
Robustness medium 🟡 Fail rate = 0.063 Punctuation Removal 63/1000 tested samples (6.3%) changed prediction after perturbation
🔍✨Examples When feature “text” is perturbed with the transformation “Punctuation Removal”, the model changes its prediction in 6.3% of the cases. We expected the predictions not to be affected by this transformation.
text Punctuation Removal(text) Original prediction Prediction after perturbation
1329 "Jacob I'm going to see Sam Smith tomorrow, wanna come with?" Jacob I m going to see Sam Smith tomorrow wanna come with LABEL_1 (p = 0.83) LABEL_2 (p = 0.51)
1302 Oh and Rafa said before the injury he was having the best year he ever had was 1st in the race... :( #M6 Oh and Rafa said before the injury he was having the best year he ever had was 1st in the race ( #M6 LABEL_1 (p = 0.50) LABEL_2 (p = 0.75)
1288 it looks like a beautiful night to throw myself off the Brooklyn Bridge ---@Tim_Hecht it looks like a beautiful night to throw myself off the Brooklyn Bridge @Tim_Hecht LABEL_1 (p = 0.41) LABEL_2 (p = 0.45)

Disclaimer: it's important to note that automated scans may produce false positives or miss certain vulnerabilities. We encourage you to review the findings and assess the impact accordingly.

💡 What's Next?

  • Checkout the Giskard Space and improve your model.
  • The Giskard community is always buzzing with ideas. 🐢🤔 What do you want to see next? Your feedback is our favorite fuel, so drop your thoughts in the community forum! 🗣️💬 Together, we're building something extraordinary.

🙌 Big Thanks!

We're grateful to have you on this adventure with us. 🚀🌟 Here's to more breakthroughs, laughter, and code magic! 🥂✨ Keep hugging that code and spreading the love! 💻 #Giskard #Huggingface #AISafety 🌈👏 Your enthusiasm, feedback, and contributions are what seek. 🌟 Keep being awesome!

Sign up or log in to comment