PowerMoe-3b-GGUF / README.md
TobDeBer's picture
Update README.md
dc91234 verified
metadata
pipeline_tag: text-generation
inference: false
license: apache-2.0
model-index:
  - name: ibm/PowerMoE-3b
    results:
      - task:
          type: text-generation
        dataset:
          type: lm-eval-harness
          name: ARC
        metrics:
          - name: accuracy-norm
            type: accuracy-norm
            value: 58.1
            verified: false
      - task:
          type: text-generation
        dataset:
          type: lm-eval-harness
          name: BoolQ
        metrics:
          - name: accuracy
            type: accuracy
            value: 65
            verified: false
      - task:
          type: text-generation
        dataset:
          type: lm-eval-harness
          name: Hellaswag
        metrics:
          - name: accuracy-norm
            type: accuracy-norm
            value: 71.5
            verified: false
      - task:
          type: text-generation
        dataset:
          type: lm-eval-harness
          name: OpenBookQA
        metrics:
          - name: accuracy-norm
            type: accuracy-norm
            value: 41
            verified: false
      - task:
          type: text-generation
        dataset:
          type: lm-eval-harness
          name: PIQA
        metrics:
          - name: accuracy-norm
            type: accuracy-norm
            value: 79.1
            verified: false
      - task:
          type: text-generation
        dataset:
          type: lm-eval-harness
          name: Winogrande
        metrics:
          - name: accuracy-norm
            type: accuracy-norm
            value: 65
            verified: false
      - task:
          type: text-generation
        dataset:
          type: lm-eval-harness
          name: MMLU (5 shot)
        metrics:
          - name: accuracy
            type: accuracy
            value: 42.8
            verified: false
      - task:
          type: text-generation
        dataset:
          type: lm-eval-harness
          name: GSM8k (5 shot)
        metrics:
          - name: accuracy
            type: accuracy
            value: 25.9
            verified: false
      - task:
          type: text-generation
        dataset:
          type: lm-eval-harness
          name: math (4 shot)
        metrics:
          - name: accuracy
            type: accuracy
            value: 14.8
            verified: false
      - task:
          type: text-generation
        dataset:
          type: bigcode-eval
          name: humaneval
        metrics:
          - name: pass@1
            type: pass@1
            value: 20.1
            verified: false
      - task:
          type: text-generation
        dataset:
          type: bigcode-eval
          name: MBPP
        metrics:
          - name: pass@1
            type: pass@1
            value: 32.4
            verified: false
base_model:
  - ibm/PowerMoE-3b

Model Summary

PowerMoE-3B is a 3B sparse Mixture-of-Experts (sMoE) language model trained with the Power learning rate scheduler. It sparsely activates 800M parameters for each token. It is trained on a mix of open-source and proprietary datasets. PowerMoE-3B has shown promising results compared to other dense models with 2x activate parameters across various benchmarks, including natural language multi-choices, code generation, and math reasoning. Paper: https://arxiv.org/abs/2408.13359

This is a GGUF quantized version.

Usage

Requires latest llama.cpp to run.

Generation

This is a simple example of how to use the PowerMoe GGUF:

./llama-cli -m PowerMoE4x800M_q3km.gguf -p "How about a snack?"