mwitiderrick's picture
Adding Evaluation Results (#1)
19561b0 verified
metadata
language:
  - en
license: apache-2.0
library_name: transformers
tags:
  - transformers
datasets:
  - mwitiderrick/OpenPlatypus
base_model: vihangd/shearedplats-2.7b-v2
inference: true
model_type: llama
prompt_template: |
  ### Instruction:\n
  {prompt}
  ### Response:
created_by: mwitiderrick
pipeline_tag: text-generation
model-index:
  - name: mwitiderrick/shearedplats-2.7b-v2-instruct-v0.1
    results:
      - task:
          type: text-generation
        dataset:
          name: hellaswag
          type: hellaswag
        metrics:
          - type: hellaswag (0-Shot)
            value: 0.5283
            name: hellaswag(0-Shot)
      - task:
          type: text-generation
        dataset:
          name: winogrande
          type: winogrande
        metrics:
          - type: winogrande (0-Shot)
            value: 0.6464
            name: winogrande(0-Shot)
      - task:
          type: text-generation
        dataset:
          name: arc_challenge
          type: arc_challenge
        metrics:
          - type: arc_challenge (0-Shot)
            value: 0.3652
            name: arc_challenge(0-Shot)
        source:
          url: >-
            https://huggingface.co/mwitiderrick/shearedplats-2.7b-v2-instruct-v0.1
          name: shearedplats-2.7b-v2-instruct-v0.1 model card
      - task:
          type: text-generation
          name: Text Generation
        dataset:
          name: AI2 Reasoning Challenge (25-Shot)
          type: ai2_arc
          config: ARC-Challenge
          split: test
          args:
            num_few_shot: 25
        metrics:
          - type: acc_norm
            value: 40.19
            name: normalized accuracy
        source:
          url: >-
            https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?query=mwitiderrick/shearedplats-2.7b-v2-instruct-v0.1
          name: Open LLM Leaderboard
      - task:
          type: text-generation
          name: Text Generation
        dataset:
          name: HellaSwag (10-Shot)
          type: hellaswag
          split: validation
          args:
            num_few_shot: 10
        metrics:
          - type: acc_norm
            value: 70.08
            name: normalized accuracy
        source:
          url: >-
            https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?query=mwitiderrick/shearedplats-2.7b-v2-instruct-v0.1
          name: Open LLM Leaderboard
      - task:
          type: text-generation
          name: Text Generation
        dataset:
          name: MMLU (5-Shot)
          type: cais/mmlu
          config: all
          split: test
          args:
            num_few_shot: 5
        metrics:
          - type: acc
            value: 28.12
            name: accuracy
        source:
          url: >-
            https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?query=mwitiderrick/shearedplats-2.7b-v2-instruct-v0.1
          name: Open LLM Leaderboard
      - task:
          type: text-generation
          name: Text Generation
        dataset:
          name: TruthfulQA (0-shot)
          type: truthful_qa
          config: multiple_choice
          split: validation
          args:
            num_few_shot: 0
        metrics:
          - type: mc2
            value: 41.23
        source:
          url: >-
            https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?query=mwitiderrick/shearedplats-2.7b-v2-instruct-v0.1
          name: Open LLM Leaderboard
      - task:
          type: text-generation
          name: Text Generation
        dataset:
          name: Winogrande (5-shot)
          type: winogrande
          config: winogrande_xl
          split: validation
          args:
            num_few_shot: 5
        metrics:
          - type: acc
            value: 65.04
            name: accuracy
        source:
          url: >-
            https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?query=mwitiderrick/shearedplats-2.7b-v2-instruct-v0.1
          name: Open LLM Leaderboard
      - task:
          type: text-generation
          name: Text Generation
        dataset:
          name: GSM8k (5-shot)
          type: gsm8k
          config: main
          split: test
          args:
            num_few_shot: 5
        metrics:
          - type: acc
            value: 2.12
            name: accuracy
        source:
          url: >-
            https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?query=mwitiderrick/shearedplats-2.7b-v2-instruct-v0.1
          name: Open LLM Leaderboard

ShearedPlats-7b Instruct

This is an ShearedPlats-7b model that has been fine-tuned on 2 epochs of the Open-Platypus dataset.

The modified version of the dataset can be found here

Prompt Template

### Instruction:

{query}

### Response:
<Leave new line for model to respond> 

Usage

from transformers import AutoTokenizer, AutoModelForCausalLM,pipeline

tokenizer = AutoTokenizer.from_pretrained("mwitiderrick/shearedplats-2.7b-v2-instruct-v0.1")
model = AutoModelForCausalLM.from_pretrained("mwitiderrick/shearedplats-2.7b-v2-instruct-v0.1")
query = "Provide step-by-step instructions for making a sweet chicken bugger"
text_gen = pipeline(task="text-generation", model=model, tokenizer=tokenizer, max_length=350)
output = text_gen(f"### Instruction:\n{query}\n### Response:\n")
print(output[0]['generated_text'])
"""
### Instruction:
Provide step-by-step instructions for making a sweet chicken bugger
### Response:
Step 1: Prepare the ingredients

You will need a mixture of ground chicken, breadcrumbs, butter, Worcestershire sauce, garlic powder, onion powder, salt, and pepper.

Step 2: Form the bugger

Take a piece of chicken breast meat and use a sharp knife to cut it into small cubes. Place the cubes in a bowl and add the remaining ingredients: breadcrumbs, butter, Worcestershire sauce, garlic powder, onion powder, salt, and pepper. Mix the ingredients together until they are well combined.

Step 3: Shape the bugger

Take a piece of the bugger mixture and form it into a ball. Place the ball on a plate or in a bag and refrigerate it for 30 minutes.

Step 4: Cook the bugger

Heat a grill pan or grill to medium-high heat. Take the bugger out of the refrigerator and place it on the grill. Cook the bugger for 5-7 minutes on each side, or until it is cooked through.

Step 5: Serve and enjoy!

Once the bugger is cooked, serve it hot and enjoy!

Note: You can also use a sweet chicken bugger mix to make sweet chicken buggers. Simply follow the instructions above, but use the sweet chicken bugger mix instead of the ground chicken.

Enjoy your sweet chicken buggers!
"""

Evals

|  Tasks  |Version|Filter|n-shot| Metric |Value |   |Stderr|
|---------|-------|------|-----:|--------|-----:|---|-----:|
|hellaswag|Yaml   |none  |     0|acc     |0.5283|±  |0.0050|
|         |       |none  |     0|acc_norm|0.7068|±  |0.0045|


|  Groups  |Version|Filter|n-shot|  Metric   | Value |   |Stderr|
|----------|-------|------|-----:|-----------|------:|---|-----:|
|truthfulqa|N/A    |none  |     0|acc        | 0.3411|±  |0.0016|
|          |       |none  |     0|bleu_max   |19.4174|±  |0.6888|
|          |       |none  |     0|bleu_acc   | 0.3378|±  |0.0166|
|          |       |none  |     0|bleu_diff  |-4.4165|±  |0.6611|
|          |       |none  |     0|rouge1_max |43.6923|±  |0.8239|
|          |       |none  |     0|rouge1_acc | 0.3305|±  |0.0165|
|          |       |none  |     0|rouge1_diff|-6.4023|±  |0.7680|
|          |       |none  |     0|rouge2_max |28.4074|±  |0.8883|
|          |       |none  |     0|rouge2_acc | 0.2827|±  |0.0158|
|          |       |none  |     0|rouge2_diff|-6.7716|±  |0.8844|
|          |       |none  |     0|rougeL_max |40.2657|±  |0.8218|
|          |       |none  |     0|rougeL_acc | 0.3023|±  |0.0161|
|          |       |none  |     0|rougeL_diff|-6.5447|±  |0.7706|

|----------|-------|------|-----:|------|-----:|---|-----:|
|winogrande|Yaml   |none  |     0|acc   |0.6464|±  |0.0134|

|-------------|-------|------|-----:|--------|-----:|---|-----:|
|arc_challenge|Yaml   |none  |     0|acc     |0.3652|±  |0.0141|
|             |       |none  |     0|acc_norm|0.3908|±  |0.0143|

Open LLM Leaderboard Evaluation Results

Detailed results can be found here

Metric Value
Avg. 41.13
AI2 Reasoning Challenge (25-Shot) 40.19
HellaSwag (10-Shot) 70.08
MMLU (5-Shot) 28.12
TruthfulQA (0-shot) 41.23
Winogrande (5-shot) 65.04
GSM8k (5-shot) 2.12