inoki-giskard/scan-report-temp · Report for distilbert-base-uncased-finetuned-sst-2-english

Hi Team,

This is a report from Giskard Bot Scan 🐢.

We have identified 13 potential vulnerabilities in your model based on an automated scan.

This automated analysis evaluated the model on the dataset sst2 (subset default, split validation).

👉Performance issues (12)

Vulnerability	Level	Data slice	Metric	Transformation	Deviation
Performance	major 🔴	`text_length(text)` >= 50.500 AND `text_length(text)` < 61.500	Precision = 0.759	—	-15.50% than global

🔍✨Examples

For records in the dataset where `text_length(text)` >= 50.500 AND `text_length(text)` < 61.500, the Precision is 15.5% lower than the global Precision.

	text	text_length(text)	label	Predicted `label`
92	you wo n't like roger , but you will quickly recognize him .	61	NEGATIVE	POSITIVE (p = 1.00)
171	rarely has leukemia looked so shimmering and benign .	54	NEGATIVE	POSITIVE (p = 0.98)
183	the lower your expectations , the more you 'll enjoy it .	58	NEGATIVE	POSITIVE (p = 1.00)

Vulnerability	Level	Data slice	Metric	Transformation	Deviation
Performance	major 🔴	`avg_whitespace(text)` >= 0.174 AND `avg_whitespace(text)` < 0.177	Recall = 0.815	—	-12.40% than global

🔍✨Examples

For records in the dataset where `avg_whitespace(text)` >= 0.174 AND `avg_whitespace(text)` < 0.177, the Recall is 12.4% lower than the global Recall.

	text	avg_whitespace(text)	label	Predicted `label`
64	the script kicks in , and mr. hartley 's distended pace and foot-dragging rhythms follow .	0.175824	NEGATIVE	POSITIVE (p = 0.86)
87	jaglom ... put ( s ) the audience in the privileged position of eavesdropping on his characters	0.177083	POSITIVE	NEGATIVE (p = 1.00)
248	a full world has been presented onscreen , not some series of carefully structured plot points building to a pat resolution .	0.174603	POSITIVE	NEGATIVE (p = 0.96)

Vulnerability	Level	Data slice	Metric	Transformation	Deviation
Performance	major 🔴	`avg_word_length(text)` < 4.743 AND `avg_word_length(text)` >= 4.645	Recall = 0.815	—	-12.40% than global

🔍✨Examples

For records in the dataset where `avg_word_length(text)` < 4.743 AND `avg_word_length(text)` >= 4.645, the Recall is 12.4% lower than the global Recall.

	text	avg_word_length(text)	label	Predicted `label`
64	the script kicks in , and mr. hartley 's distended pace and foot-dragging rhythms follow .	4.6875	NEGATIVE	POSITIVE (p = 0.86)
87	jaglom ... put ( s ) the audience in the privileged position of eavesdropping on his characters	4.64706	POSITIVE	NEGATIVE (p = 1.00)
248	a full world has been presented onscreen , not some series of carefully structured plot points building to a pat resolution .	4.72727	POSITIVE	NEGATIVE (p = 0.96)

Vulnerability	Level	Data slice	Metric	Transformation	Deviation
Performance	major 🔴	`text_length(text)` >= 73.500 AND `text_length(text)` < 82.500	Recall = 0.826	—	-11.19% than global

🔍✨Examples

For records in the dataset where `text_length(text)` >= 73.500 AND `text_length(text)` < 82.500, the Recall is 11.19% lower than the global Recall.

	text	text_length(text)	label	Predicted `label`
93	if steven soderbergh 's ` solaris ' is a failure it is a glorious failure .	76	POSITIVE	NEGATIVE (p = 1.00)
123	turns potentially forgettable formula into something strangely diverting .	75	POSITIVE	NEGATIVE (p = 0.99)
142	what better message than ` love thyself ' could young women of any size receive ?	82	POSITIVE	NEGATIVE (p = 0.99)

Vulnerability	Level	Data slice	Metric	Transformation	Deviation
Performance	medium 🟡	`avg_whitespace(text)` >= 0.182 AND `avg_whitespace(text)` < 0.185	Recall = 0.864	—	-7.15% than global

🔍✨Examples

For records in the dataset where `avg_whitespace(text)` >= 0.182 AND `avg_whitespace(text)` < 0.185, the Recall is 7.15% lower than the global Recall.

	text	avg_whitespace(text)	label	Predicted `label`
273	minority report is exactly what the title indicates , a report .	0.184615	POSITIVE	NEGATIVE (p = 0.86)
324	you 'll gasp appalled and laugh outraged and possibly , watching the spectacle of a promising young lad treading desperately in a nasty sea , shed an errant tear .	0.182927	POSITIVE	NEGATIVE (p = 0.95)
356	jason x is positively anti-darwinian : nine sequels and 400 years later , the teens are none the wiser and jason still kills on auto-pilot .	0.184397	NEGATIVE	POSITIVE (p = 0.97)

Vulnerability	Level	Data slice	Metric	Transformation	Deviation
Performance	medium 🟡	`avg_word_length(text)` < 4.483 AND `avg_word_length(text)` >= 4.396	Recall = 0.864	—	-7.15% than global

🔍✨Examples

For records in the dataset where `avg_word_length(text)` < 4.483 AND `avg_word_length(text)` >= 4.396, the Recall is 7.15% lower than the global Recall.

	text	avg_word_length(text)	label	Predicted `label`
273	minority report is exactly what the title indicates , a report .	4.41667	POSITIVE	NEGATIVE (p = 0.86)
324	you 'll gasp appalled and laugh outraged and possibly , watching the spectacle of a promising young lad treading desperately in a nasty sea , shed an errant tear .	4.46667	POSITIVE	NEGATIVE (p = 0.95)
356	jason x is positively anti-darwinian : nine sequels and 400 years later , the teens are none the wiser and jason still kills on auto-pilot .	4.42308	NEGATIVE	POSITIVE (p = 0.97)

Vulnerability	Level	Data slice	Metric	Transformation	Deviation
Performance	medium 🟡	`text_length(text)` >= 165.500 AND `text_length(text)` < 179.500	Recall = 0.871	—	-6.37% than global

🔍✨Examples

For records in the dataset where `text_length(text)` >= 165.500 AND `text_length(text)` < 179.500, the Recall is 6.37% lower than the global Recall.

	text	text_length(text)	label	Predicted `label`
158	by getting myself wrapped up in the visuals and eccentricities of many of the characters , i found myself confused when it came time to get to the heart of the movie .	168	NEGATIVE	POSITIVE (p = 0.99)
266	a coda in every sense , the pinochet case splits time between a minute-by-minute account of the british court 's extradition chess game and the regime 's talking-head survivors .	179	POSITIVE	NEGATIVE (p = 0.99)
282	while there 's something intrinsically funny about sir anthony hopkins saying ` get in the car , bitch , ' this jerry bruckheimer production has little else to offer	166	POSITIVE	NEGATIVE (p = 1.00)

Vulnerability	Level	Data slice	Metric	Transformation	Deviation
Performance	medium 🟡	`avg_whitespace(text)` >= 0.205 AND `avg_whitespace(text)` < 0.213	Recall = 0.875	—	-5.93% than global

🔍✨Examples

For records in the dataset where `avg_whitespace(text)` >= 0.205 AND `avg_whitespace(text)` < 0.213, the Recall is 5.93% lower than the global Recall.

	text	avg_whitespace(text)	label	Predicted `label`
93	if steven soderbergh 's ` solaris ' is a failure it is a glorious failure .	0.210526	POSITIVE	NEGATIVE (p = 1.00)
183	the lower your expectations , the more you 'll enjoy it .	0.206897	NEGATIVE	POSITIVE (p = 1.00)
501	harrison 's flowers puts its heart in the right place , but its brains are in no particular place at all .	0.205607	POSITIVE	NEGATIVE (p = 0.99)

Vulnerability	Level	Data slice	Metric	Transformation	Deviation
Performance	medium 🟡	`avg_word_length(text)` < 3.867 AND `avg_word_length(text)` >= 3.696	Recall = 0.875	—	-5.93% than global

🔍✨Examples

For records in the dataset where `avg_word_length(text)` < 3.867 AND `avg_word_length(text)` >= 3.696, the Recall is 5.93% lower than the global Recall.

	text	avg_word_length(text)	label	Predicted `label`
93	if steven soderbergh 's ` solaris ' is a failure it is a glorious failure .	3.75	POSITIVE	NEGATIVE (p = 1.00)
183	the lower your expectations , the more you 'll enjoy it .	3.83333	NEGATIVE	POSITIVE (p = 1.00)
501	harrison 's flowers puts its heart in the right place , but its brains are in no particular place at all .	3.86364	POSITIVE	NEGATIVE (p = 0.99)

Vulnerability	Level	Data slice	Metric	Transformation	Deviation
Performance	medium 🟡	`text_length(text)` >= 151.500 AND `text_length(text)` < 165.500	Recall = 0.875	—	-5.93% than global

🔍✨Examples

For records in the dataset where `text_length(text)` >= 151.500 AND `text_length(text)` < 165.500, the Recall is 5.93% lower than the global Recall.

	text	text_length(text)	label	Predicted `label`
324	you 'll gasp appalled and laugh outraged and possibly , watching the spectacle of a promising young lad treading desperately in a nasty sea , shed an errant tear .	164	POSITIVE	NEGATIVE (p = 0.95)
673	drops you into a dizzying , volatile , pressure-cooker of a situation that quickly snowballs out of control , while focusing on the what much more than the why .	162	POSITIVE	NEGATIVE (p = 0.94)
692	sustains its dreamlike glide through a succession of cheesy coincidences and voluptuous cheap effects , not the least of which is rebecca romijn-stamos .	154	NEGATIVE	POSITIVE (p = 0.94)

Vulnerability	Level	Data slice	Metric	Transformation	Deviation
Performance	medium 🟡	`avg_whitespace(text)` < 0.168 AND `avg_whitespace(text)` >= 0.164	Accuracy = 0.859	—	-5.62% than global

🔍✨Examples

For records in the dataset where `avg_whitespace(text)` < 0.168 AND `avg_whitespace(text)` >= 0.164, the Accuracy is 5.62% lower than the global Accuracy.

	text	avg_whitespace(text)	label	Predicted `label`
171	rarely has leukemia looked so shimmering and benign .	0.166667	NEGATIVE	POSITIVE (p = 0.98)
184	though perry and hurley make inspiring efforts to breathe life into the disjointed , haphazard script by jay scherick and david ronn , neither the actors nor director reginald hudlin can make it more than fitfully entertaining .	0.165939	NEGATIVE	POSITIVE (p = 0.66)
266	a coda in every sense , the pinochet case splits time between a minute-by-minute account of the british court 's extradition chess game and the regime 's talking-head survivors .	0.167598	POSITIVE	NEGATIVE (p = 0.99)

Vulnerability	Level	Data slice	Metric	Transformation	Deviation
Performance	medium 🟡	`avg_word_length(text)` >= 4.935 AND `avg_word_length(text)` < 5.113	Accuracy = 0.859	—	-5.62% than global

🔍✨Examples

For records in the dataset where `avg_word_length(text)` >= 4.935 AND `avg_word_length(text)` < 5.113, the Accuracy is 5.62% lower than the global Accuracy.

	text	avg_word_length(text)	label	Predicted `label`
171	rarely has leukemia looked so shimmering and benign .	5	NEGATIVE	POSITIVE (p = 0.98)
184	though perry and hurley make inspiring efforts to breathe life into the disjointed , haphazard script by jay scherick and david ronn , neither the actors nor director reginald hudlin can make it more than fitfully entertaining .	5.02632	NEGATIVE	POSITIVE (p = 0.66)
266	a coda in every sense , the pinochet case splits time between a minute-by-minute account of the british court 's extradition chess game and the regime 's talking-head survivors .	4.96667	POSITIVE	NEGATIVE (p = 0.99)

👉Robustness issues (1)

Vulnerability	Level	Data slice	Metric	Transformation	Deviation
Robustness	major 🔴	—	Fail rate = 0.130	Add typos	104/800 tested samples (13.0%) changed prediction after perturbation

🔍✨Examples

When feature “text” is perturbed with the transformation “Add typos”, the model changes its prediction in 13.0% of the cases. We expected the predictions not to be affected by this transformation.

	text	Add typos(text)	Original prediction	Prediction after perturbation
13	we root for ( clara and paul ) , even like them , though perhaps it 's an emotion closer to pity .	we root for ( clara and paul ) , even like them , htough perhaps it 's an emotiom closer to pity .	POSITIVE (p = 0.96)	NEGATIVE (p = 0.99)
16	the emotions are raw and will strike a nerve with anyone who 's ever had family trauma .	the ekotions are raw andw ill strike a nerve with anyone wgo 's ever had family trauma .	POSITIVE (p = 1.00)	NEGATIVE (p = 0.60)
22	holden caulfield did it better .	holdsn caulfkeld did t better .	POSITIVE (p = 0.99)	NEGATIVE (p = 1.00)

Checkout out the Giskard Space and test your model.

Disclaimer: it's important to note that automated scans may produce false positives or miss certain vulnerabilities. We encourage you to review the findings and assess the impact accordingly.