metadata

license: mit
widget:
  - text: |
      USER: Cinder, What adventures do you think lie ahead?<END>
model-index:
  - name: distillgpt2Cinder
    results:
      - task:
          type: text-generation
          name: Text Generation
        dataset:
          name: AI2 Reasoning Challenge (25-Shot)
          type: ai2_arc
          config: ARC-Challenge
          split: test
          args:
            num_few_shot: 25
        metrics:
          - type: acc_norm
            value: 24.49
            name: normalized accuracy
        source:
          url: >-
            https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?query=Josephgflowers/distillgpt2Cinder
          name: Open LLM Leaderboard
      - task:
          type: text-generation
          name: Text Generation
        dataset:
          name: HellaSwag (10-Shot)
          type: hellaswag
          split: validation
          args:
            num_few_shot: 10
        metrics:
          - type: acc_norm
            value: 27.24
            name: normalized accuracy
        source:
          url: >-
            https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?query=Josephgflowers/distillgpt2Cinder
          name: Open LLM Leaderboard
      - task:
          type: text-generation
          name: Text Generation
        dataset:
          name: MMLU (5-Shot)
          type: cais/mmlu
          config: all
          split: test
          args:
            num_few_shot: 5
        metrics:
          - type: acc
            value: 24.97
            name: accuracy
        source:
          url: >-
            https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?query=Josephgflowers/distillgpt2Cinder
          name: Open LLM Leaderboard
      - task:
          type: text-generation
          name: Text Generation
        dataset:
          name: TruthfulQA (0-shot)
          type: truthful_qa
          config: multiple_choice
          split: validation
          args:
            num_few_shot: 0
        metrics:
          - type: mc2
            value: 43.96
        source:
          url: >-
            https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?query=Josephgflowers/distillgpt2Cinder
          name: Open LLM Leaderboard
      - task:
          type: text-generation
          name: Text Generation
        dataset:
          name: Winogrande (5-shot)
          type: winogrande
          config: winogrande_xl
          split: validation
          args:
            num_few_shot: 5
        metrics:
          - type: acc
            value: 50.12
            name: accuracy
        source:
          url: >-
            https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?query=Josephgflowers/distillgpt2Cinder
          name: Open LLM Leaderboard
      - task:
          type: text-generation
          name: Text Generation
        dataset:
          name: GSM8k (5-shot)
          type: gsm8k
          config: main
          split: test
          args:
            num_few_shot: 5
        metrics:
          - type: acc
            value: 0.23
            name: accuracy
        source:
          url: >-
            https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?query=Josephgflowers/distillgpt2Cinder
          name: Open LLM Leaderboard

This is a distilgpt2 orginal run of Cinder with multi character chat. A little glitchy sometimes. The end sequence for each statement is <END>

If you would like multi chat with characters it is recommended to use <END>\nUSER: as the stop token.

USER: Hi, What is your name? Tell me about yourself.

CINDER: "I'm Cinder," a humanoid in my humanoid form. I was first introduced into the vastness of space during my lifetime.

Today, I'm a humanoid robot, and I'm designed for cosmic exploration, which is a subject of much speculation.

However, I'm designed for space exploration and scientific exploration.

Open LLM Leaderboard Evaluation Results

Detailed results can be found here

Metric	Value
Avg.	28.50
AI2 Reasoning Challenge (25-Shot)	24.49
HellaSwag (10-Shot)	27.24
MMLU (5-Shot)	24.97
TruthfulQA (0-shot)	43.96
Winogrande (5-shot)	50.12
GSM8k (5-shot)	0.23