Edit model card

Piccolo-math-2x7b

In loving memory of my dog Klaus (Piccolo)

~ Piccolo (Italian): the little one ~

piccolo.png

Code Example

Inference and Evaluation colab available here

from transformers import AutoModelForCausalLM, AutoTokenizer

def generate_response(prompt):
    """
    Generate a response from the model based on the input prompt.
    Args:
    prompt (str): Prompt for the model.

    Returns:
    str: The generated response from the model.
    """
    inputs = tokenizer(prompt, return_tensors="pt")
    outputs = model.generate(**inputs, max_new_tokens=256, eos_token_id=tokenizer.eos_token_id, pad_token_id=tokenizer.pad_token_id)

    response = tokenizer.decode(outputs[0], skip_special_tokens=True)

    return response

model_id = "macadeliccc/piccolo-math-2x7b"
tokenizer = AutoTokenizer.from_pretrained(model_id)
model = AutoModelForCausalLM.from_pretrained(model_id,load_in_4bit=True)

prompt = "What is the best way to train Cane Corsos?"

print("Response:")
print(generate_response(prompt), "\n")

The model is capable of quality code, math, and logical reasoning. Try whatever questions you think of.

Evaluations

Model AGIEval GPT4All TruthfulQA Bigbench Average
piccolo-math-2x7b 43.89 74.98 63.96 44.99 56.96

EQ Bench

Benchmark Complete:

  • 2024-01-24 00:00:40
  • Time taken: 183.3 mins
  • Prompt Format: Mistral
  • Model: macadeliccc/piccolo-math-2x7b
  • Score (v2): 70.74
  • Parseable: 167.0

Batch completed Time taken: 183.3 mins

AGIEval

Task Version Metric Value Stderr
agieval_aqua_rat 0 acc 24.41 ± 2.70
acc_norm 24.80 ± 2.72
agieval_logiqa_en 0 acc 35.79 ± 1.88
acc_norm 36.71 ± 1.89
agieval_lsat_ar 0 acc 23.48 ± 2.80
acc_norm 23.91 ± 2.82
agieval_lsat_lr 0 acc 49.22 ± 2.22
acc_norm 50.00 ± 2.22
agieval_lsat_rc 0 acc 63.94 ± 2.93
acc_norm 64.31 ± 2.93
agieval_sat_en 0 acc 77.18 ± 2.93
acc_norm 76.70 ± 2.95
agieval_sat_en_without_passage 0 acc 45.15 ± 3.48
acc_norm 44.66 ± 3.47
agieval_sat_math 0 acc 33.64 ± 3.19
acc_norm 30.00 ± 3.10

Average: 43.89%

GPT4All

Task Version Metric Value Stderr
arc_challenge 0 acc 61.86 ± 1.42
acc_norm 62.88 ± 1.41
arc_easy 0 acc 84.34 ± 0.75
acc_norm 80.47 ± 0.81
boolq 1 acc 86.88 ± 0.59
hellaswag 0 acc 68.56 ± 0.46
acc_norm 85.16 ± 0.35
openbookqa 0 acc 37.00 ± 2.16
acc_norm 47.80 ± 2.24
piqa 0 acc 82.21 ± 0.89
acc_norm 83.68 ± 0.86
winogrande 0 acc 77.98 ± 1.16

Average: 74.98%

TruthfulQA

Task Version Metric Value Stderr
truthfulqa_mc 1 mc1 47.37 ± 1.75
mc2 63.96 ± 1.57

Average: 63.96%

Bigbench

Task Version Metric Value Stderr
bigbench_causal_judgement 0 multiple_choice_grade 55.26 ± 3.62
bigbench_date_understanding 0 multiple_choice_grade 63.14 ± 2.51
bigbench_disambiguation_qa 0 multiple_choice_grade 42.64 ± 3.08
bigbench_geometric_shapes 0 multiple_choice_grade 22.84 ± 2.22
exact_str_match 3.34 ± 0.95
bigbench_logical_deduction_five_objects 0 multiple_choice_grade 36.60 ± 2.16
bigbench_logical_deduction_seven_objects 0 multiple_choice_grade 25.57 ± 1.65
bigbench_logical_deduction_three_objects 0 multiple_choice_grade 56.00 ± 2.87
bigbench_movie_recommendation 0 multiple_choice_grade 42.40 ± 2.21
bigbench_navigate 0 multiple_choice_grade 54.70 ± 1.57
bigbench_reasoning_about_colored_objects 0 multiple_choice_grade 62.90 ± 1.08
bigbench_ruin_names 0 multiple_choice_grade 53.35 ± 2.36
bigbench_salient_translation_error_detection 0 multiple_choice_grade 24.35 ± 1.36
bigbench_snarks 0 multiple_choice_grade 62.43 ± 3.61
bigbench_sports_understanding 0 multiple_choice_grade 70.28 ± 1.46
bigbench_temporal_sequences 0 multiple_choice_grade 41.30 ± 1.56
bigbench_tracking_shuffled_objects_five_objects 0 multiple_choice_grade 22.32 ± 1.18
bigbench_tracking_shuffled_objects_seven_objects 0 multiple_choice_grade 17.77 ± 0.91
bigbench_tracking_shuffled_objects_three_objects 0 multiple_choice_grade 56.00 ± 2.87

Average: 44.99%

Average score: 56.96%

Elapsed time: 01:51:53

Open LLM Leaderboard Evaluation Results

Detailed results can be found here

Metric Value
Avg. 72.32
AI2 Reasoning Challenge (25-Shot) 69.11
HellaSwag (10-Shot) 87.27
MMLU (5-Shot) 63.69
TruthfulQA (0-shot) 63.86
Winogrande (5-shot) 79.87
GSM8k (5-shot) 70.13
Downloads last month
2,930
Safetensors
Model size
12.9B params
Tensor type
BF16
·

Collection including macadeliccc/piccolo-math-2x7b

Evaluation results