Piccolo-math-2x7b

In loving memory of my dog Klaus (Piccolo)

~ Piccolo (Italian): the little one ~

piccolo.png

Code Example

Inference and Evaluation colab available here

from transformers import AutoModelForCausalLM, AutoTokenizer

def generate_response(prompt):
    """
    Generate a response from the model based on the input prompt.
    Args:
    prompt (str): Prompt for the model.

    Returns:
    str: The generated response from the model.
    """
    inputs = tokenizer(prompt, return_tensors="pt")
    outputs = model.generate(**inputs, max_new_tokens=256, eos_token_id=tokenizer.eos_token_id, pad_token_id=tokenizer.pad_token_id)

    response = tokenizer.decode(outputs[0], skip_special_tokens=True)

    return response

model_id = "macadeliccc/piccolo-math-2x7b"
tokenizer = AutoTokenizer.from_pretrained(model_id)
model = AutoModelForCausalLM.from_pretrained(model_id,load_in_4bit=True)

prompt = "What is the best way to train Cane Corsos?"

print("Response:")
print(generate_response(prompt), "\n")

The model is capable of quality code, math, and logical reasoning. Try whatever questions you think of.

Evaluations

Model AGIEval GPT4All TruthfulQA Bigbench Average
piccolo-math-2x7b 43.89 74.98 63.96 44.99 56.96

EQ Bench

Benchmark Complete:

  • 2024-01-24 00:00:40
  • Time taken: 183.3 mins
  • Prompt Format: Mistral
  • Model: macadeliccc/piccolo-math-2x7b
  • Score (v2): 70.74
  • Parseable: 167.0

Batch completed Time taken: 183.3 mins

AGIEval

Task Version Metric Value Stderr
agieval_aqua_rat 0 acc 24.41 Β± 2.70
acc_norm 24.80 Β± 2.72
agieval_logiqa_en 0 acc 35.79 Β± 1.88
acc_norm 36.71 Β± 1.89
agieval_lsat_ar 0 acc 23.48 Β± 2.80
acc_norm 23.91 Β± 2.82
agieval_lsat_lr 0 acc 49.22 Β± 2.22
acc_norm 50.00 Β± 2.22
agieval_lsat_rc 0 acc 63.94 Β± 2.93
acc_norm 64.31 Β± 2.93
agieval_sat_en 0 acc 77.18 Β± 2.93
acc_norm 76.70 Β± 2.95
agieval_sat_en_without_passage 0 acc 45.15 Β± 3.48
acc_norm 44.66 Β± 3.47
agieval_sat_math 0 acc 33.64 Β± 3.19
acc_norm 30.00 Β± 3.10

Average: 43.89%

GPT4All

Task Version Metric Value Stderr
arc_challenge 0 acc 61.86 Β± 1.42
acc_norm 62.88 Β± 1.41
arc_easy 0 acc 84.34 Β± 0.75
acc_norm 80.47 Β± 0.81
boolq 1 acc 86.88 Β± 0.59
hellaswag 0 acc 68.56 Β± 0.46
acc_norm 85.16 Β± 0.35
openbookqa 0 acc 37.00 Β± 2.16
acc_norm 47.80 Β± 2.24
piqa 0 acc 82.21 Β± 0.89
acc_norm 83.68 Β± 0.86
winogrande 0 acc 77.98 Β± 1.16

Average: 74.98%

TruthfulQA

Task Version Metric Value Stderr
truthfulqa_mc 1 mc1 47.37 Β± 1.75
mc2 63.96 Β± 1.57

Average: 63.96%

Bigbench

Task Version Metric Value Stderr
bigbench_causal_judgement 0 multiple_choice_grade 55.26 Β± 3.62
bigbench_date_understanding 0 multiple_choice_grade 63.14 Β± 2.51
bigbench_disambiguation_qa 0 multiple_choice_grade 42.64 Β± 3.08
bigbench_geometric_shapes 0 multiple_choice_grade 22.84 Β± 2.22
exact_str_match 3.34 Β± 0.95
bigbench_logical_deduction_five_objects 0 multiple_choice_grade 36.60 Β± 2.16
bigbench_logical_deduction_seven_objects 0 multiple_choice_grade 25.57 Β± 1.65
bigbench_logical_deduction_three_objects 0 multiple_choice_grade 56.00 Β± 2.87
bigbench_movie_recommendation 0 multiple_choice_grade 42.40 Β± 2.21
bigbench_navigate 0 multiple_choice_grade 54.70 Β± 1.57
bigbench_reasoning_about_colored_objects 0 multiple_choice_grade 62.90 Β± 1.08
bigbench_ruin_names 0 multiple_choice_grade 53.35 Β± 2.36
bigbench_salient_translation_error_detection 0 multiple_choice_grade 24.35 Β± 1.36
bigbench_snarks 0 multiple_choice_grade 62.43 Β± 3.61
bigbench_sports_understanding 0 multiple_choice_grade 70.28 Β± 1.46
bigbench_temporal_sequences 0 multiple_choice_grade 41.30 Β± 1.56
bigbench_tracking_shuffled_objects_five_objects 0 multiple_choice_grade 22.32 Β± 1.18
bigbench_tracking_shuffled_objects_seven_objects 0 multiple_choice_grade 17.77 Β± 0.91
bigbench_tracking_shuffled_objects_three_objects 0 multiple_choice_grade 56.00 Β± 2.87

Average: 44.99%

Average score: 56.96%

Elapsed time: 01:51:53

Open LLM Leaderboard Evaluation Results

Detailed results can be found here

Metric Value
Avg. 72.32
AI2 Reasoning Challenge (25-Shot) 69.11
HellaSwag (10-Shot) 87.27
MMLU (5-Shot) 63.69
TruthfulQA (0-shot) 63.86
Winogrande (5-shot) 79.87
GSM8k (5-shot) 70.13
Downloads last month
938
Safetensors
Model size
12.9B params
Tensor type
BF16
Β·
Inference Providers NEW
This model isn't deployed by any Inference Provider. πŸ™‹ Ask for provider support

Model tree for macadeliccc/piccolo-math-2x7b

Quantizations
3 models

Spaces using macadeliccc/piccolo-math-2x7b 6

Collection including macadeliccc/piccolo-math-2x7b

Evaluation results