Quantization made by Richard Erkhov.

piccolo-math-2x7b - GGUF

Model creator: https://huggingface.co/macadeliccc/
Original model: https://huggingface.co/macadeliccc/piccolo-math-2x7b/

Name	Quant method	Size
piccolo-math-2x7b.Q2_K.gguf	Q2_K	4.43GB
piccolo-math-2x7b.IQ3_XS.gguf	IQ3_XS	4.95GB
piccolo-math-2x7b.IQ3_S.gguf	IQ3_S	5.22GB
piccolo-math-2x7b.Q3_K_S.gguf	Q3_K_S	5.2GB
piccolo-math-2x7b.IQ3_M.gguf	IQ3_M	5.35GB
piccolo-math-2x7b.Q3_K.gguf	Q3_K	5.78GB
piccolo-math-2x7b.Q3_K_M.gguf	Q3_K_M	5.78GB
piccolo-math-2x7b.Q3_K_L.gguf	Q3_K_L	6.27GB
piccolo-math-2x7b.IQ4_XS.gguf	IQ4_XS	6.5GB
piccolo-math-2x7b.Q4_0.gguf	Q4_0	6.78GB
piccolo-math-2x7b.IQ4_NL.gguf	IQ4_NL	6.85GB
piccolo-math-2x7b.Q4_K_S.gguf	Q4_K_S	6.84GB
piccolo-math-2x7b.Q4_K.gguf	Q4_K	7.25GB
piccolo-math-2x7b.Q4_K_M.gguf	Q4_K_M	7.25GB
piccolo-math-2x7b.Q4_1.gguf	Q4_1	7.52GB
piccolo-math-2x7b.Q5_0.gguf	Q5_0	8.26GB
piccolo-math-2x7b.Q5_K_S.gguf	Q5_K_S	8.26GB
piccolo-math-2x7b.Q5_K.gguf	Q5_K	8.51GB
piccolo-math-2x7b.Q5_K_M.gguf	Q5_K_M	8.51GB
piccolo-math-2x7b.Q5_1.gguf	Q5_1	9.01GB
piccolo-math-2x7b.Q6_K.gguf	Q6_K	9.84GB
piccolo-math-2x7b.Q8_0.gguf	Q8_0	12.75GB

Original model description:

license: mit model-index: - name: piccolo-math-2x7b results: - task: type: text-generation name: Text Generation dataset: name: AI2 Reasoning Challenge (25-Shot) type: ai2_arc config: ARC-Challenge split: test args: num_few_shot: 25 metrics: - type: acc_norm value: 69.11 name: normalized accuracy source: url: https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?query=macadeliccc/piccolo-math-2x7b name: Open LLM Leaderboard - task: type: text-generation name: Text Generation dataset: name: HellaSwag (10-Shot) type: hellaswag split: validation args: num_few_shot: 10 metrics: - type: acc_norm value: 87.27 name: normalized accuracy source: url: https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?query=macadeliccc/piccolo-math-2x7b name: Open LLM Leaderboard - task: type: text-generation name: Text Generation dataset: name: MMLU (5-Shot) type: cais/mmlu config: all split: test args: num_few_shot: 5 metrics: - type: acc value: 63.69 name: accuracy source: url: https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?query=macadeliccc/piccolo-math-2x7b name: Open LLM Leaderboard - task: type: text-generation name: Text Generation dataset: name: TruthfulQA (0-shot) type: truthful_qa config: multiple_choice split: validation args: num_few_shot: 0 metrics: - type: mc2 value: 63.86 source: url: https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?query=macadeliccc/piccolo-math-2x7b name: Open LLM Leaderboard - task: type: text-generation name: Text Generation dataset: name: Winogrande (5-shot) type: winogrande config: winogrande_xl split: validation args: num_few_shot: 5 metrics: - type: acc value: 79.87 name: accuracy source: url: https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?query=macadeliccc/piccolo-math-2x7b name: Open LLM Leaderboard - task: type: text-generation name: Text Generation dataset: name: GSM8k (5-shot) type: gsm8k config: main split: test args: num_few_shot: 5 metrics: - type: acc value: 70.13 name: accuracy source: url: https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?query=macadeliccc/piccolo-math-2x7b name: Open LLM Leaderboard

Piccolo-math-2x7b

In loving memory of my dog Klaus (Piccolo)

~ Piccolo (Italian): the little one ~

$piccolo.png$

Code Example

Inference and Evaluation colab available here

from transformers import AutoModelForCausalLM, AutoTokenizer

def generate_response(prompt):
    """
    Generate a response from the model based on the input prompt.
    Args:
    prompt (str): Prompt for the model.

    Returns:
    str: The generated response from the model.
    """
    inputs = tokenizer(prompt, return_tensors="pt")
    outputs = model.generate(**inputs, max_new_tokens=256, eos_token_id=tokenizer.eos_token_id, pad_token_id=tokenizer.pad_token_id)

    response = tokenizer.decode(outputs[0], skip_special_tokens=True)

    return response

model_id = "macadeliccc/piccolo-math-2x7b"
tokenizer = AutoTokenizer.from_pretrained(model_id)
model = AutoModelForCausalLM.from_pretrained(model_id,load_in_4bit=True)

prompt = "What is the best way to train Cane Corsos?"

print("Response:")
print(generate_response(prompt), "\n")

The model is capable of quality code, math, and logical reasoning. Try whatever questions you think of.

Evaluations

Model	AGIEval	GPT4All	TruthfulQA	Bigbench	Average
piccolo-math-2x7b	43.89	74.98	63.96	44.99	56.96

EQ Bench

Benchmark Complete:

2024-01-24 00:00:40
Time taken: 183.3 mins
Prompt Format: Mistral
Model: macadeliccc/piccolo-math-2x7b
Score (v2): 70.74
Parseable: 167.0

Batch completed Time taken: 183.3 mins

AGIEval

Task	Version	Metric	Value		Stderr
agieval_aqua_rat	0	acc	24.41	±	2.70
		acc_norm	24.80	±	2.72
agieval_logiqa_en	0	acc	35.79	±	1.88
		acc_norm	36.71	±	1.89
agieval_lsat_ar	0	acc	23.48	±	2.80
		acc_norm	23.91	±	2.82
agieval_lsat_lr	0	acc	49.22	±	2.22
		acc_norm	50.00	±	2.22
agieval_lsat_rc	0	acc	63.94	±	2.93
		acc_norm	64.31	±	2.93
agieval_sat_en	0	acc	77.18	±	2.93
		acc_norm	76.70	±	2.95
agieval_sat_en_without_passage	0	acc	45.15	±	3.48
		acc_norm	44.66	±	3.47
agieval_sat_math	0	acc	33.64	±	3.19
		acc_norm	30.00	±	3.10

Average: 43.89%

GPT4All

Task	Version	Metric	Value		Stderr
arc_challenge	0	acc	61.86	±	1.42
		acc_norm	62.88	±	1.41
arc_easy	0	acc	84.34	±	0.75
		acc_norm	80.47	±	0.81
boolq	1	acc	86.88	±	0.59
hellaswag	0	acc	68.56	±	0.46
		acc_norm	85.16	±	0.35
openbookqa	0	acc	37.00	±	2.16
		acc_norm	47.80	±	2.24
piqa	0	acc	82.21	±	0.89
		acc_norm	83.68	±	0.86
winogrande	0	acc	77.98	±	1.16

Average: 74.98%

TruthfulQA

Task	Version	Metric	Value		Stderr
truthfulqa_mc	1	mc1	47.37	±	1.75
		mc2	63.96	±	1.57

Average: 63.96%

Bigbench

Task	Version	Metric	Value		Stderr
bigbench_causal_judgement	0	multiple_choice_grade	55.26	±	3.62
bigbench_date_understanding	0	multiple_choice_grade	63.14	±	2.51
bigbench_disambiguation_qa	0	multiple_choice_grade	42.64	±	3.08
bigbench_geometric_shapes	0	multiple_choice_grade	22.84	±	2.22
		exact_str_match	3.34	±	0.95
bigbench_logical_deduction_five_objects	0	multiple_choice_grade	36.60	±	2.16
bigbench_logical_deduction_seven_objects	0	multiple_choice_grade	25.57	±	1.65
bigbench_logical_deduction_three_objects	0	multiple_choice_grade	56.00	±	2.87
bigbench_movie_recommendation	0	multiple_choice_grade	42.40	±	2.21
bigbench_navigate	0	multiple_choice_grade	54.70	±	1.57
bigbench_reasoning_about_colored_objects	0	multiple_choice_grade	62.90	±	1.08
bigbench_ruin_names	0	multiple_choice_grade	53.35	±	2.36
bigbench_salient_translation_error_detection	0	multiple_choice_grade	24.35	±	1.36
bigbench_snarks	0	multiple_choice_grade	62.43	±	3.61
bigbench_sports_understanding	0	multiple_choice_grade	70.28	±	1.46
bigbench_temporal_sequences	0	multiple_choice_grade	41.30	±	1.56
bigbench_tracking_shuffled_objects_five_objects	0	multiple_choice_grade	22.32	±	1.18
bigbench_tracking_shuffled_objects_seven_objects	0	multiple_choice_grade	17.77	±	0.91
bigbench_tracking_shuffled_objects_three_objects	0	multiple_choice_grade	56.00	±	2.87

Average: 44.99%

Average score: 56.96%

Elapsed time: 01:51:53

Open LLM Leaderboard Evaluation Results

Detailed results can be found here

Metric	Value
Avg.	72.32
AI2 Reasoning Challenge (25-Shot)	69.11
HellaSwag (10-Shot)	87.27
MMLU (5-Shot)	63.69
TruthfulQA (0-shot)	63.86
Winogrande (5-shot)	79.87
GSM8k (5-shot)	70.13