leaderboard-pr-bot's picture
Adding Evaluation Results
83333aa
|
raw
history blame
3.1 kB
metadata
language:
  - en
pipeline_tag: text-generation
library_name: transformers
tags:
  - cerebras
  - LLM
inference: false

Instruction-tuned Cerebras GPT 111M

The smallest of cerebras GPT models with only 111M parameters instruction fine-tuned.

Model Description

Instruction fine-tuned cerebras-GPT-111M

Evaluation

The model has been evaluated with Huggingface's Open LLM leaderboard. Have a look at the leaderboard for more details: https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard The performance of the instruction fine-tuned model does improve compared to the cerebras base model by about 5.7% (average score):

Model Average ARC (25-shot) HellaSwag (10-shot) MMLU (5-shot) TruthfulQA (0-shot)
SebastianSchramm/Cerebras-GPT-111M-instruction 31.6 24.3 26.2 26.5 49.5
cerebras/Cerebras-GPT-111M 29.9 20 26.7 26.7 46.3

Training data

The model was fine-tuned with the following data: alpaca_gpt4_data (data generated by GPT-4 using Alpaca prompts for fine-tuning LLMs) and alpaca_data_cleaned.

Prompt template

Fine-tuning was performed with the promp template from stanford alpaca:

PROMPT_DICT = {
    "prompt_input": (
        "Below is an instruction that describes a task, paired with an input that provides further context. "
        "Write a response that appropriately completes the request.\n\n"
        "### Instruction:\n{instruction}\n\n### Input:\n{input}\n\n### Response:"
    ),
    "prompt_no_input": (
        "Below is an instruction that describes a task. "
        "Write a response that appropriately completes the request.\n\n"
        "### Instruction:\n{instruction}\n\n### Response:"
    ),
}

Usage

It is recommended to format input according to the prompt template mentioned above during inference for best results.

Open LLM Leaderboard Evaluation Results

Detailed results can be found here

Metric Value
Avg. 25.37
ARC (25-shot) 24.4
HellaSwag (10-shot) 26.05
MMLU (5-shot) 25.87
TruthfulQA (0-shot) 49.46
Winogrande (5-shot) 51.62
GSM8K (5-shot) 0.0
DROP (3-shot) 0.17