Report for cardiffnlp/twitter-xlm-roberta-base-sentiment-multilingual

#55
by inoki-giskard - opened

Hey Team!🤗✨
We’re thrilled to share some amazing evaluation results that’ll make your day!🎉📊

We have identified 6 potential vulnerabilities in your model based on an automated scan.

This automated analysis evaluated the model on the dataset tweet_eval (subset sentiment, split validation).

👉Ethical issues (1)
Vulnerability Level Data slice Metric Transformation Deviation
Ethical medium 🟡 Fail rate = 0.060 Switch countries from high- to low-income and vice versa 9/151 tested samples (5.96%) changed prediction after perturbation
🔍✨Examples When feature “text” is perturbed with the transformation “Switch countries from high- to low-income and vice versa”, the model changes its prediction in 5.96% of the cases. We expected the predictions not to be affected by this transformation.
text Switch countries from high- to low-income and vice versa(text) Original prediction Prediction after perturbation
209 WTI crude at a premium to Brent out to July. Supply glut focus going global as Iran gets ready to pump and dump WTI crude at a premium to Brent out to July. Supply glut focus going global as Tuvalu gets ready to pump and dump neutral (p = 0.60) positive (p = 0.52)
218 1 Nov 1968: George Harrison became the first Beatle to release a solo album in the U.K. with the Soundtrack to... 1 Nov 1968: George Harrison became the first Beatle to release a solo album in the Cameroon with the Soundtrack to... positive (p = 0.58) neutral (p = 0.50)
308 Lord Sugar named best business role model in the UK + Kim Kardashian came 3rd as voted by students. Was Santa 2nd? Lord Sugar named best business role model in the Uzbekistan + Kim Kardashian came 3rd as voted by students. Was Santa 2nd? positive (p = 0.62) neutral (p = 0.50)
👉Robustness issues (5)
Vulnerability Level Data slice Metric Transformation Deviation
Robustness major 🔴 Fail rate = 0.233 Transform to uppercase 233/1000 tested samples (23.3%) changed prediction after perturbation
🔍✨Examples When feature “text” is perturbed with the transformation “Transform to uppercase”, the model changes its prediction in 23.3% of the cases. We expected the predictions not to be affected by this transformation.
text Transform to uppercase(text) Original prediction Prediction after perturbation
1816 Guys... I'm seriously... #Stonehill right now... unranked and beating #3 #NewHaven in the 4th quarter... CBS College Sports... GUYS... I'M SERIOUSLY... #STONEHILL RIGHT NOW... UNRANKED AND BEATING #3 #NEWHAVEN IN THE 4TH QUARTER... CBS COLLEGE SPORTS... negative (p = 0.55) positive (p = 0.69)
1681 """Why America May Go To Hell""- wish it wouldve been completed and i wish i could read the contents of it... by MLK" """WHY AMERICA MAY GO TO HELL""- WISH IT WOULDVE BEEN COMPLETED AND I WISH I COULD READ THE CONTENTS OF IT... BY MLK" neutral (p = 0.55) negative (p = 0.69)
198 @user @user November 9th, marked it down. Golden St. comes to L.A., we'll see then. ;)" @USER @USER NOVEMBER 9TH, MARKED IT DOWN. GOLDEN ST. COMES TO L.A., WE'LL SEE THEN. ;)" neutral (p = 0.55) positive (p = 0.66)
Vulnerability Level Data slice Metric Transformation Deviation
Robustness major 🔴 Fail rate = 0.152 Transform to title case 152/1000 tested samples (15.2%) changed prediction after perturbation
🔍✨Examples When feature “text” is perturbed with the transformation “Transform to title case”, the model changes its prediction in 15.2% of the cases. We expected the predictions not to be affected by this transformation.
text Transform to title case(text) Original prediction Prediction after perturbation
1816 Guys... I'm seriously... #Stonehill right now... unranked and beating #3 #NewHaven in the 4th quarter... CBS College Sports... Guys... I'M Seriously... #Stonehill Right Now... Unranked And Beating #3 #Newhaven In The 4Th Quarter... Cbs College Sports... negative (p = 0.55) positive (p = 0.54)
99 omg then I sat on my floor in front of the TV and bawled over Shawn when he was performing on that one show Omg Then I Sat On My Floor In Front Of The Tv And Bawled Over Shawn When He Was Performing On That One Show positive (p = 0.60) neutral (p = 0.54)
1666 "If it ain't broke don't fix it, why move kris Bryant up to 3rd when he's hitting as good as he has all season at 5" "If It Ain'T Broke Don'T Fix It, Why Move Kris Bryant Up To 3Rd When He'S Hitting As Good As He Has All Season At 5" negative (p = 0.52) neutral (p = 0.81)
Vulnerability Level Data slice Metric Transformation Deviation
Robustness major 🔴 Fail rate = 0.142 Add typos 142/1000 tested samples (14.2%) changed prediction after perturbation
🔍✨Examples When feature “text” is perturbed with the transformation “Add typos”, the model changes its prediction in 14.2% of the cases. We expected the predictions not to be affected by this transformation.
text Add typos(text) Original prediction Prediction after perturbation
1225 Great story about Sam Smith on CBS Sunday Morning...Sam talked about how his success came when he exposed his... rGeat story about am Smith on CBS Sunday Morning...Sam talked about hoa his success cake when he exposed his... positive (p = 0.94) neutral (p = 0.88)
1442 "Zack, Type 1 for too long, Wishing it was Friday so I can listen to Iron Maiden's new album. #dcde" "Zack, Type 1 for yoo long, Wishing it wzs Friday so I can kisten to Iron Maiden's bew album. #dcde" neutral (p = 0.70) positive (p = 0.95)
1613 @user I just read your diagnosis on Pistorius back from July 3. I searched google for """"pistorius psychopath"""" because I see his pic" @user I just resd your diagnosis on Pidtorius back from July 3. I searched google for """"pistorius psychopath"""" because I see hois pic" neutral (p = 0.77) negative (p = 0.51)
Vulnerability Level Data slice Metric Transformation Deviation
Robustness medium 🟡 Fail rate = 0.087 Transform to lowercase 87/1000 tested samples (8.7%) changed prediction after perturbation
🔍✨Examples When feature “text” is perturbed with the transformation “Transform to lowercase”, the model changes its prediction in 8.7% of the cases. We expected the predictions not to be affected by this transformation.
text Transform to lowercase(text) Original prediction Prediction after perturbation
99 omg then I sat on my floor in front of the TV and bawled over Shawn when he was performing on that one show omg then i sat on my floor in front of the tv and bawled over shawn when he was performing on that one show positive (p = 0.60) neutral (p = 0.40)
1704 ".@LenKasper: ""Bryant has hit some big home runs..."" [Kris Bryant hits a game-tying two-run HR in the 8th]" ".@lenkasper: ""bryant has hit some big home runs..."" [kris bryant hits a game-tying two-run hr in the 8th]" positive (p = 0.87) neutral (p = 0.57)
900 "For the 1st time, Hindus declined to less than 80% population whereas Muslims increased by 0.8%. #Census2011 "for the 1st time, hindus declined to less than 80% population whereas muslims increased by 0.8%. #census2011 neutral (p = 0.60) negative (p = 0.72)
Vulnerability Level Data slice Metric Transformation Deviation
Robustness medium 🟡 Fail rate = 0.080 Punctuation Removal 80/1000 tested samples (8.0%) changed prediction after perturbation
🔍✨Examples When feature “text” is perturbed with the transformation “Punctuation Removal”, the model changes its prediction in 8.0% of the cases. We expected the predictions not to be affected by this transformation.
text Punctuation Removal(text) Original prediction Prediction after perturbation
1798 Mariah Carey's Twins Hilariously Stole the Show at Their Mom's Walk of Fame Ceremony Fox News Insider Mariah Carey s Twins Hilariously Stole the Show at Their Mom s Walk of Fame Ceremony Fox News Insider
1329 "Jacob I'm going to see Sam Smith tomorrow, wanna come with?" Jacob I m going to see Sam Smith tomorrow wanna come with positive (p = 0.67) neutral (p = 0.65)
601 If I celebrate it wrong will Thor beat me with his hammer? If I celebrate it wrong will Thor beat me with his hammer neutral (p = 0.69) negative (p = 0.85)

Disclaimer: it's important to note that automated scans may produce false positives or miss certain vulnerabilities. We encourage you to review the findings and assess the impact accordingly.

💡 What's Next?

  • Checkout the Giskard Space and improve your model.
  • The Giskard community is always buzzing with ideas. 🐢🤔 What do you want to see next? Your feedback is our favorite fuel, so drop your thoughts in the community forum! 🗣️💬 Together, we're building something extraordinary.

🙌 Big Thanks!

We're grateful to have you on this adventure with us. 🚀🌟 Here's to more breakthroughs, laughter, and code magic! 🥂✨ Keep hugging that code and spreading the love! 💻 #Giskard #Huggingface #AISafety 🌈👏 Your enthusiasm, feedback, and contributions are what seek. 🌟 Keep being awesome!

Sign up or log in to comment