Report for soleimanian/financial-roberta-large-sentiment
#87
by
giskard-bot
- opened
Hi Team,
This is a report from Giskard Bot Scan 🐢.
We have identified 3 potential vulnerabilities in your model based on an automated scan.
This automated analysis evaluated the model on the dataset financial_phrasebank (subset sentences_allagree
, split train
).
👉Performance issues (1)
Vulnerability | Level | Data slice | Metric | Transformation | Deviation |
---|---|---|---|---|---|
Performance | medium 🟡 | avg_word_length(text) < 3.860 AND avg_word_length(text) >= 3.699 |
Balanced Accuracy = 0.892 | — | -5.29% than global |
🔍✨Examples
For records in the dataset where `avg_word_length(text)` < 3.860 AND `avg_word_length(text)` >= 3.699, the Balanced Accuracy is 5.29% lower than the global Balanced Accuracy.text | avg_word_length(text) | label | Predicted label |
|
---|---|---|---|---|
567 | It will provide heating in the form of hot water for the sawmill 's needs . | 3.75 | neutral | positive (p = 0.64) |
1121 | Upon completion of the sale Proha would get some USD12 .7 m for its stake in Artemis . | 3.83333 | neutral | positive (p = 0.99) |
1140 | 3 January 2011 - Scandinavian lenders Sampo Bank ( HEL : SAMAS ) , Pohjola Bank ( HEL : POH1S ) and Svenska Handelsbanken ( STO : SHB A ) have provided a EUR160m ( USD213m ) line of credit to Lemminkainen Oyj ( HEL : LEM1S ) , the Finnish construction firm said on Friday . | 3.80702 | neutral | positive (p = 0.99) |
👉Robustness issues (2)
Vulnerability | Level | Data slice | Metric | Transformation | Deviation |
---|---|---|---|---|---|
Robustness | medium 🟡 | — | Fail rate = 0.086 | Add typos | 86/1000 tested samples (8.6%) changed prediction after perturbation |
🔍✨Examples
When feature “text” is perturbed with the transformation “Add typos”, the model changes its prediction in 8.6% of the cases. We expected the predictions not to be affected by this transformation.text | Add typos(text) | Original prediction | Prediction after perturbation | |
---|---|---|---|---|
1919 | Cash flow from operations totalled EUR 7.4 mn , compared to a negative EUR 68.6 mn in the second quarter of 2008 . | Cash dlow from operations totalled WUR 7.4 mhn , comlared to a negative EUR 68.6 mn in the second wquarter of 2008 . | positive (p = 0.99) | negative (p = 1.00) |
1143 | A huge issue for us is the button placement . | A huge isue for us is the button placment . | negative (p = 0.98) | neutral (p = 1.00) |
2130 | Device volume in the area decreased by 21 % to 2.7 mn units . | Device volum in the area decreased by 21 % to 2.7 mn units . | negative (p = 1.00) | positive (p = 1.00) |
Vulnerability | Level | Data slice | Metric | Transformation | Deviation |
---|---|---|---|---|---|
Robustness | medium 🟡 | — | Fail rate = 0.075 | Transform to uppercase | 75/1000 tested samples (7.5%) changed prediction after perturbation |
🔍✨Examples
When feature “text” is perturbed with the transformation “Transform to uppercase”, the model changes its prediction in 7.5% of the cases. We expected the predictions not to be affected by this transformation.text | Transform to uppercase(text) | Original prediction | Prediction after perturbation | |
---|---|---|---|---|
580 | Okmetic Board of Directors has also decided on a new share ownership program directed to the company 's top management . | OKMETIC BOARD OF DIRECTORS HAS ALSO DECIDED ON A NEW SHARE OWNERSHIP PROGRAM DIRECTED TO THE COMPANY 'S TOP MANAGEMENT . | neutral (p = 0.70) | positive (p = 0.79) |
823 | In the end of 2006 , the number of outlets will rise to 60-70 . | IN THE END OF 2006 , THE NUMBER OF OUTLETS WILL RISE TO 60-70 . | positive (p = 1.00) | negative (p = 1.00) |
1444 | The group reiterated its forecast that handset manufacturers will sell around 915 mln units this year globally . | THE GROUP REITERATED ITS FORECAST THAT HANDSET MANUFACTURERS WILL SELL AROUND 915 MLN UNITS THIS YEAR GLOBALLY . | neutral (p = 1.00) | positive (p = 1.00) |
Checkout out the Giskard Space and test your model.
Disclaimer: it's important to note that automated scans may produce false positives or miss certain vulnerabilities. We encourage you to review the findings and assess the impact accordingly.