macadeliccc
/

piccolo-8x7b

Text Generation

text-generation-inference

Inference Endpoints

Model card Files Files and versions Community

macadeliccc commited on Jan 17

Commit

4dc6ca7

•

1 Parent(s): 1802d6f

Update README.md

Files changed (1) hide show

README.md +2 -71

README.md CHANGED Viewed

@@ -50,74 +50,5 @@ The model is capable of quality code, math, and logical reasoning. Try whatever
 # Evaluations
-Model 	AGIEval 	GPT4All 	TruthfulQA 	Bigbench 	Average
-piccolo-math-2x7b 	43.89 	74.98 	63.96 	44.99 	56.96
-AGIEval
-Task 	Version 	Metric 	Value 		Stderr
-agieval_aqua_rat 	0 	acc 	24.41 	± 	2.70
-		acc_norm 	24.80 	± 	2.72
-agieval_logiqa_en 	0 	acc 	35.79 	± 	1.88
-		acc_norm 	36.71 	± 	1.89
-agieval_lsat_ar 	0 	acc 	23.48 	± 	2.80
-		acc_norm 	23.91 	± 	2.82
-agieval_lsat_lr 	0 	acc 	49.22 	± 	2.22
-		acc_norm 	50.00 	± 	2.22
-agieval_lsat_rc 	0 	acc 	63.94 	± 	2.93
-		acc_norm 	64.31 	± 	2.93
-agieval_sat_en 	0 	acc 	77.18 	± 	2.93
-		acc_norm 	76.70 	± 	2.95
-agieval_sat_en_without_passage 	0 	acc 	45.15 	± 	3.48
-		acc_norm 	44.66 	± 	3.47
-agieval_sat_math 	0 	acc 	33.64 	± 	3.19
-		acc_norm 	30.00 	± 	3.10
-Average: 43.89%
-GPT4All
-Task 	Version 	Metric 	Value 		Stderr
-arc_challenge 	0 	acc 	61.86 	± 	1.42
-		acc_norm 	62.88 	± 	1.41
-arc_easy 	0 	acc 	84.34 	± 	0.75
-		acc_norm 	80.47 	± 	0.81
-boolq 	1 	acc 	86.88 	± 	0.59
-hellaswag 	0 	acc 	68.56 	± 	0.46
-		acc_norm 	85.16 	± 	0.35
-openbookqa 	0 	acc 	37.00 	± 	2.16
-		acc_norm 	47.80 	± 	2.24
-piqa 	0 	acc 	82.21 	± 	0.89
-		acc_norm 	83.68 	± 	0.86
-winogrande 	0 	acc 	77.98 	± 	1.16
-Average: 74.98%
-TruthfulQA
-Task 	Version 	Metric 	Value 		Stderr
-truthfulqa_mc 	1 	mc1 	47.37 	± 	1.75
-		mc2 	63.96 	± 	1.57
-Average: 63.96%
-Bigbench
-Task 	Version 	Metric 	Value 		Stderr
-bigbench_causal_judgement 	0 	multiple_choice_grade 	55.26 	± 	3.62
-bigbench_date_understanding 	0 	multiple_choice_grade 	63.14 	± 	2.51
-bigbench_disambiguation_qa 	0 	multiple_choice_grade 	42.64 	± 	3.08
-bigbench_geometric_shapes 	0 	multiple_choice_grade 	22.84 	± 	2.22
-		exact_str_match 	3.34 	± 	0.95
-bigbench_logical_deduction_five_objects 	0 	multiple_choice_grade 	36.60 	± 	2.16
-bigbench_logical_deduction_seven_objects 	0 	multiple_choice_grade 	25.57 	± 	1.65
-bigbench_logical_deduction_three_objects 	0 	multiple_choice_grade 	56.00 	± 	2.87
-bigbench_movie_recommendation 	0 	multiple_choice_grade 	42.40 	± 	2.21
-bigbench_navigate 	0 	multiple_choice_grade 	54.70 	± 	1.57
-bigbench_reasoning_about_colored_objects 	0 	multiple_choice_grade 	62.90 	± 	1.08
-bigbench_ruin_names 	0 	multiple_choice_grade 	53.35 	± 	2.36
-bigbench_salient_translation_error_detection 	0 	multiple_choice_grade 	24.35 	± 	1.36
-bigbench_snarks 	0 	multiple_choice_grade 	62.43 	± 	3.61
-bigbench_sports_understanding 	0 	multiple_choice_grade 	70.28 	± 	1.46
-bigbench_temporal_sequences 	0 	multiple_choice_grade 	41.30 	± 	1.56
-bigbench_tracking_shuffled_objects_five_objects 	0 	multiple_choice_grade 	22.32 	± 	1.18
-bigbench_tracking_shuffled_objects_seven_objects 	0 	multiple_choice_grade 	17.77 	± 	0.91
-bigbench_tracking_shuffled_objects_three_objects 	0 	multiple_choice_grade 	56.00 	± 	2.87
-Average: 44.99%
-Average score: 56.96%
-Elapsed time: 01:51:53


50
51	# Evaluations
52
53	+ TODO
54	+