mayank-mishra commited on
Commit
f85eda6
·
verified ·
1 Parent(s): efaa9ac

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +152 -0
README.md CHANGED
@@ -1,3 +1,155 @@
1
  ---
 
 
2
  license: apache-2.0
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
3
  ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
  ---
2
+ pipeline_tag: text-generation
3
+ inference: false
4
  license: apache-2.0
5
+ library_name: transformers
6
+ model-index:
7
+ - name: ibm/PowerMoE-3b
8
+ results:
9
+ - task:
10
+ type: text-generation
11
+ dataset:
12
+ type: lm-eval-harness
13
+ name: ARC
14
+ metrics:
15
+ - name: accuracy-norm
16
+ type: accuracy-norm
17
+ value: 54.8
18
+ verified: false
19
+ - task:
20
+ type: text-generation
21
+ dataset:
22
+ type: lm-eval-harness
23
+ name: BoolQ
24
+ metrics:
25
+ - name: accuracy
26
+ type: accuracy
27
+ value: 67.1
28
+ verified: false
29
+ - task:
30
+ type: text-generation
31
+ dataset:
32
+ type: lm-eval-harness
33
+ name: Hellaswag
34
+ metrics:
35
+ - name: accuracy-norm
36
+ type: accuracy-norm
37
+ value: 70.3
38
+ verified: false
39
+ - task:
40
+ type: text-generation
41
+ dataset:
42
+ type: lm-eval-harness
43
+ name: OpenBookQA
44
+ metrics:
45
+ - name: accuracy-norm
46
+ type: accuracy-norm
47
+ value: 38.4
48
+ verified: false
49
+ - task:
50
+ type: text-generation
51
+ dataset:
52
+ type: lm-eval-harness
53
+ name: PIQA
54
+ metrics:
55
+ - name: accuracy-norm
56
+ type: accuracy-norm
57
+ value: 77.5
58
+ verified: false
59
+ - task:
60
+ type: text-generation
61
+ dataset:
62
+ type: lm-eval-harness
63
+ name: Winogrande
64
+ metrics:
65
+ - name: accuracy-norm
66
+ type: accuracy-norm
67
+ value: 64.6
68
+ verified: false
69
+ - task:
70
+ type: text-generation
71
+ dataset:
72
+ type: lm-eval-harness
73
+ name: MMLU
74
+ metrics:
75
+ - name: accuracy
76
+ type: accuracy
77
+ value: 33.4
78
+ verified: false
79
+ - task:
80
+ type: text-generation
81
+ dataset:
82
+ type: lm-eval-harness
83
+ name: GSM8k (5 shot)
84
+ metrics:
85
+ - name: accuracy
86
+ type: accuracy
87
+ value: 27.2
88
+ verified: false
89
+ - task:
90
+ type: text-generation
91
+ dataset:
92
+ type: lm-eval-harness
93
+ name: math (4 shot)
94
+ metrics:
95
+ - name: accuracy
96
+ type: accuracy
97
+ value: 10
98
+ verified: false
99
+ - task:
100
+ type: text-generation
101
+ dataset:
102
+ type: bigcode-eval
103
+ name: humaneval
104
+ metrics:
105
+ - name: pass@1
106
+ type: pass@1
107
+ value: 15.2
108
+ verified: false
109
+ - task:
110
+ type: text-generation
111
+ dataset:
112
+ type: bigcode-eval
113
+ name: MBPP
114
+ metrics:
115
+ - name: pass@1
116
+ type: pass@1
117
+ value: 24
118
+ verified: false
119
  ---
120
+
121
+ ## Model Summary
122
+ PowerMoE-3B is a 3B sparse Mixture-of-Experts (sMoE) language model trained with the Power learning rate scheduler. It sparsely activates 800M parameters for each token. It is trained on a wide range of open-source and synthetic datasets with permissive licenses. PowerMoE-3B has shown promising results compared to other dense models with 2x activate parameters across various benchmarks, including natural language multi-choices, code generation, and math reasoning.
123
+
124
+ ## Usage
125
+
126
+ ### Generation
127
+ This is a simple example of how to use **PowerMoE-3b** model.
128
+
129
+ ```python
130
+ import torch
131
+ from transformers import AutoModelForCausalLM, AutoTokenizer
132
+ device = "cuda" # or "cpu"
133
+ model_path = "ibm/PowerMoE-3b"
134
+ tokenizer = AutoTokenizer.from_pretrained(model_path)
135
+ # drop device_map if running on CPU
136
+ model = AutoModelForCausalLM.from_pretrained(model_path, device_map=device)
137
+ model.eval()
138
+ # change input text as desired
139
+ chat = [
140
+ { "role": "user", "content": "Write a code to find the maximum value in a list of numbers." },
141
+ ]
142
+ chat = tokenizer.apply_chat_template(chat, tokenize=False, add_generation_prompt=True)
143
+ # tokenize the text
144
+ input_tokens = tokenizer(chat, return_tensors="pt")
145
+ # transfer tokenized inputs to the device
146
+ for i in input_tokens:
147
+ input_tokens[i] = input_tokens[i].to(device)
148
+ # generate output tokens
149
+ output = model.generate(**input_tokens, max_new_tokens=100)
150
+ # decode output tokens into text
151
+ output = tokenizer.batch_decode(output)
152
+ # loop over the batch to print, in this example the batch size is 1
153
+ for i in output:
154
+ print(i)
155
+ ```