metadata
pipeline_tag: text-generation
inference: false
license: apache-2.0
model-index:
- name: ibm/PowerMoE-3b
results:
- task:
type: text-generation
dataset:
type: lm-eval-harness
name: ARC
metrics:
- name: accuracy-norm
type: accuracy-norm
value: 58.1
verified: false
- task:
type: text-generation
dataset:
type: lm-eval-harness
name: BoolQ
metrics:
- name: accuracy
type: accuracy
value: 65
verified: false
- task:
type: text-generation
dataset:
type: lm-eval-harness
name: Hellaswag
metrics:
- name: accuracy-norm
type: accuracy-norm
value: 71.5
verified: false
- task:
type: text-generation
dataset:
type: lm-eval-harness
name: OpenBookQA
metrics:
- name: accuracy-norm
type: accuracy-norm
value: 41
verified: false
- task:
type: text-generation
dataset:
type: lm-eval-harness
name: PIQA
metrics:
- name: accuracy-norm
type: accuracy-norm
value: 79.1
verified: false
- task:
type: text-generation
dataset:
type: lm-eval-harness
name: Winogrande
metrics:
- name: accuracy-norm
type: accuracy-norm
value: 65
verified: false
- task:
type: text-generation
dataset:
type: lm-eval-harness
name: MMLU (5 shot)
metrics:
- name: accuracy
type: accuracy
value: 42.8
verified: false
- task:
type: text-generation
dataset:
type: lm-eval-harness
name: GSM8k (5 shot)
metrics:
- name: accuracy
type: accuracy
value: 25.9
verified: false
- task:
type: text-generation
dataset:
type: lm-eval-harness
name: math (4 shot)
metrics:
- name: accuracy
type: accuracy
value: 14.8
verified: false
- task:
type: text-generation
dataset:
type: bigcode-eval
name: humaneval
metrics:
- name: pass@1
type: pass@1
value: 20.1
verified: false
- task:
type: text-generation
dataset:
type: bigcode-eval
name: MBPP
metrics:
- name: pass@1
type: pass@1
value: 32.4
verified: false
base_model:
- ibm/PowerMoE-3b
Model Summary
PowerMoE-3B is a 3B sparse Mixture-of-Experts (sMoE) language model trained with the Power learning rate scheduler. It sparsely activates 800M parameters for each token. It is trained on a mix of open-source and proprietary datasets. PowerMoE-3B has shown promising results compared to other dense models with 2x activate parameters across various benchmarks, including natural language multi-choices, code generation, and math reasoning. Paper: https://arxiv.org/abs/2408.13359
This is a GGUF quantized version.
Usage
Requires latest llama.cpp to run.
Generation
This is a simple example of how to use the PowerMoe GGUF:
./llama-cli -m PowerMoE4x800M_q3km.gguf -p "How about a snack?"