RichardErkhov commited on
Commit
40e330f
1 Parent(s): 0ca96b6

uploaded readme

Browse files
Files changed (1) hide show
  1. README.md +172 -0
README.md ADDED
@@ -0,0 +1,172 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ Quantization made by Richard Erkhov.
2
+
3
+ [Github](https://github.com/RichardErkhov)
4
+
5
+ [Discord](https://discord.gg/pvy7H8DZMG)
6
+
7
+ [Request more models](https://github.com/RichardErkhov/quant_request)
8
+
9
+
10
+ PowerMoE-3b - bnb 8bits
11
+ - Model creator: https://huggingface.co/ibm/
12
+ - Original model: https://huggingface.co/ibm/PowerMoE-3b/
13
+
14
+
15
+
16
+
17
+ Original model description:
18
+ ---
19
+ pipeline_tag: text-generation
20
+ inference: false
21
+ license: apache-2.0
22
+ library_name: transformers
23
+ model-index:
24
+ - name: ibm/PowerMoE-3b
25
+ results:
26
+ - task:
27
+ type: text-generation
28
+ dataset:
29
+ type: lm-eval-harness
30
+ name: ARC
31
+ metrics:
32
+ - name: accuracy-norm
33
+ type: accuracy-norm
34
+ value: 58.1
35
+ verified: false
36
+ - task:
37
+ type: text-generation
38
+ dataset:
39
+ type: lm-eval-harness
40
+ name: BoolQ
41
+ metrics:
42
+ - name: accuracy
43
+ type: accuracy
44
+ value: 65.0
45
+ verified: false
46
+ - task:
47
+ type: text-generation
48
+ dataset:
49
+ type: lm-eval-harness
50
+ name: Hellaswag
51
+ metrics:
52
+ - name: accuracy-norm
53
+ type: accuracy-norm
54
+ value: 71.5
55
+ verified: false
56
+ - task:
57
+ type: text-generation
58
+ dataset:
59
+ type: lm-eval-harness
60
+ name: OpenBookQA
61
+ metrics:
62
+ - name: accuracy-norm
63
+ type: accuracy-norm
64
+ value: 41.0
65
+ verified: false
66
+ - task:
67
+ type: text-generation
68
+ dataset:
69
+ type: lm-eval-harness
70
+ name: PIQA
71
+ metrics:
72
+ - name: accuracy-norm
73
+ type: accuracy-norm
74
+ value: 79.1
75
+ verified: false
76
+ - task:
77
+ type: text-generation
78
+ dataset:
79
+ type: lm-eval-harness
80
+ name: Winogrande
81
+ metrics:
82
+ - name: accuracy-norm
83
+ type: accuracy-norm
84
+ value: 65.0
85
+ verified: false
86
+ - task:
87
+ type: text-generation
88
+ dataset:
89
+ type: lm-eval-harness
90
+ name: MMLU (5 shot)
91
+ metrics:
92
+ - name: accuracy
93
+ type: accuracy
94
+ value: 42.8
95
+ verified: false
96
+ - task:
97
+ type: text-generation
98
+ dataset:
99
+ type: lm-eval-harness
100
+ name: GSM8k (5 shot)
101
+ metrics:
102
+ - name: accuracy
103
+ type: accuracy
104
+ value: 25.9
105
+ verified: false
106
+ - task:
107
+ type: text-generation
108
+ dataset:
109
+ type: lm-eval-harness
110
+ name: math (4 shot)
111
+ metrics:
112
+ - name: accuracy
113
+ type: accuracy
114
+ value: 14.8
115
+ verified: false
116
+ - task:
117
+ type: text-generation
118
+ dataset:
119
+ type: bigcode-eval
120
+ name: humaneval
121
+ metrics:
122
+ - name: pass@1
123
+ type: pass@1
124
+ value: 20.1
125
+ verified: false
126
+ - task:
127
+ type: text-generation
128
+ dataset:
129
+ type: bigcode-eval
130
+ name: MBPP
131
+ metrics:
132
+ - name: pass@1
133
+ type: pass@1
134
+ value: 32.4
135
+ verified: false
136
+ ---
137
+
138
+ ## Model Summary
139
+ PowerMoE-3B is a 3B sparse Mixture-of-Experts (sMoE) language model trained with the Power learning rate scheduler. It sparsely activates 800M parameters for each token. It is trained on a mix of open-source and proprietary datasets. PowerMoE-3B has shown promising results compared to other dense models with 2x activate parameters across various benchmarks, including natural language multi-choices, code generation, and math reasoning.
140
+ Paper: https://arxiv.org/abs/2408.13359
141
+
142
+ ## Usage
143
+ Note: Requires installing HF transformers from source.
144
+
145
+ ### Generation
146
+ This is a simple example of how to use **PowerMoE-3b** model.
147
+
148
+ ```python
149
+ import torch
150
+ from transformers import AutoModelForCausalLM, AutoTokenizer
151
+ device = "cuda" # or "cpu"
152
+ model_path = "ibm/PowerMoE-3b"
153
+ tokenizer = AutoTokenizer.from_pretrained(model_path)
154
+ # drop device_map if running on CPU
155
+ model = AutoModelForCausalLM.from_pretrained(model_path, device_map=device)
156
+ model.eval()
157
+ # change input text as desired
158
+ prompt = "Write a code to find the maximum value in a list of numbers."
159
+ # tokenize the text
160
+ input_tokens = tokenizer(prompt, return_tensors="pt")
161
+ # transfer tokenized inputs to the device
162
+ for i in input_tokens:
163
+ input_tokens[i] = input_tokens[i].to(device)
164
+ # generate output tokens
165
+ output = model.generate(**input_tokens, max_new_tokens=100)
166
+ # decode output tokens into text
167
+ output = tokenizer.batch_decode(output)
168
+ # loop over the batch to print, in this example the batch size is 1
169
+ for i in output:
170
+ print(i)
171
+ ```
172
+