Report for CouchCat/ma_sa_v7_distil
Hey Team!🤗✨
We’re thrilled to share some amazing evaluation results that’ll make your day!🎉📊
We have identified 4 potential vulnerabilities in your model based on an automated scan.
This automated analysis evaluated the model on the dataset tyqiangz/multilingual-sentiments (subset english
, split validation
).
👉Robustness issues (2)
Vulnerability | Level | Data slice | Metric | Transformation | Deviation |
---|---|---|---|---|---|
Robustness | major 🔴 | — | Fail rate = 0.141 | Add typos | 44/311 tested samples (14.15%) changed prediction after perturbation |
🔍✨Examples
When feature “text” is perturbed with the transformation “Add typos”, the model changes its prediction in 14.15% of the cases. We expected the predictions not to be affected by this transformation.text | Add typos(text) | Original prediction | Prediction after perturbation | |
---|---|---|---|---|
4 | Gonna watch Final Destination 5 tonight. I always leave the theater so afraid of everything. No huge escalators for sure :S | Gonnaw atch Final Destination tonight. I alwqys leave the theater so afrzid of everything. No huge escalatosr or sjre :S | positive (p = 0.69) | neutral (p = 0.97) |
7 | Harper's Worst Offense against Refugees may be Climate Record as rising temperatures add to chaos in the Middle East | Harper's orsf Offense against Refugees may be Climate Rrcord as rising temperatures add to chaos in tje Midxle East | negative (p = 0.90) | neutral (p = 0.57) |
9 | Disappointed the Knicks vs Nets game got canceled tonight\u002c but I\u2019m even more hyped for Knicks vs Heat on Friday! | Duisappointed the Knicks vs Nets game got cqnceled tonight\u002c but I\u201(m ven more hyped for Knicks vs Heat on Fridsay! | negative (p = 0.73) | neutral (p = 0.98) |
Vulnerability | Level | Data slice | Metric | Transformation | Deviation |
---|---|---|---|---|---|
Robustness | medium 🟡 | — | Fail rate = 0.064 | Punctuation Removal | 19/299 tested samples (6.35%) changed prediction after perturbation |
🔍✨Examples
When feature “text” is perturbed with the transformation “Punctuation Removal”, the model changes its prediction in 6.35% of the cases. We expected the predictions not to be affected by this transformation.text | Punctuation Removal(text) | Original prediction | Prediction after perturbation | |
---|---|---|---|---|
15 | "More like boring eagles""""""""@Tunnyking: C'mon bro, Go out and support the Super Eagles #RT @user I hate international breaks" | More like boring eagles@Tunnyking C mon bro Go out and support the Super Eagles #RT @user I hate international breaks | positive (p = 0.75) | neutral (p = 0.50) |
16 | "The BAGRANGI new Pic,Of SALMAN khan That VERY FAMOUS IN PAK CENEMA'S at the 1st day of EID that pic,made 1.5 milion Rs Lolywood/Bolywood" | The BAGRANGI new Pic Of SALMAN khan That VERY FAMOUS IN PAK CENEMA S at the 1st day of EID that pic made 1 5 milion Rs Lolywood Bolywood | negative (p = 0.48) | neutral (p = 0.40) |
29 | Monday at Town Ballroom: RICHIE HAWTIN with LOCO DICE. Dude is so awesome. Tix still avail at | Monday at Town Ballroom RICHIE HAWTIN with LOCO DICE Dude is so awesome Tix still avail at | positive (p = 0.56) | neutral (p = 0.68) |
👉Performance issues (2)
Vulnerability | Level | Data slice | Metric | Transformation | Deviation |
---|---|---|---|---|---|
Performance | major 🔴 | text contains "time" |
Precision = 0.400 | — | -19.50% than global |
🔍✨Examples
For records in the dataset where `text` contains "time", the Precision is 19.5% lower than the global Precision.text | label | Predicted label |
|
---|---|---|---|
0 | @user @user I think after Charlie Hebdo the French did NOT react as the US did after 9/11. But they may do this time around. | negative | neutral (p = 0.70) |
35 | "According to Janet Jackson's long time producer Terry Lewis, the album is due in October. STAY CONNECTED!... | positive | neutral (p = 0.86) |
65 | Jay-Z sat in that Interview like a God showing that he was truly ahead of his time while the other niggas flirting with Foxy Brown | positive | negative (p = 0.44) |
Vulnerability | Level | Data slice | Metric | Transformation | Deviation |
---|---|---|---|---|---|
Performance | major 🔴 | text contains "going" |
Precision = 0.444 | — | -10.56% than global |
🔍✨Examples
For records in the dataset where `text` contains "going", the Precision is 10.56% lower than the global Precision.text | label | Predicted label |
|
---|---|---|---|
60 | Btw fuck Durant for going to the OKlahoma game Saturday!! You went to Texas!!! #LonghornForLife | negative | neutral (p = 0.98) |
126 | im going to b so pissed if ikon doesn't debut on sept 15th can YG STOP PULLING A FRANK OCEAN ON US | negative | neutral (p = 0.94) |
131 | @user digi was on the 18th but i didn't go but im going to slaybells | positive | neutral (p = 0.98) |
Disclaimer: it's important to note that automated scans may produce false positives or miss certain vulnerabilities. We encourage you to review the findings and assess the impact accordingly.
💡 What's Next?
- Checkout the Giskard Space and improve your model.
- The Giskard community is always buzzing with ideas. 🐢🤔 What do you want to see next? Your feedback is our favorite fuel, so drop your thoughts in the community forum! 🗣️💬 Together, we're building something extraordinary.
🙌 Big Thanks!
We're grateful to have you on this adventure with us. 🚀🌟 Here's to more breakthroughs, laughter, and code magic! 🥂✨ Keep hugging that code and spreading the love! 💻 #Giskard #Huggingface #AISafety 🌈👏 Your enthusiasm, feedback, and contributions are what seek. 🌟 Keep being awesome!