Text Generation
Transformers
Safetensors
5 languages
mixtral
Mixture of Experts
sharegpt
axolotl
conversational
text-generation-inference
MaziyarPanahi commited on
Commit
32dec96
1 Parent(s): 150f286

Create README.md (#1)

Browse files

- Create README.md (18960db34d92bea19077bd656a32774afeaaa9aa)

Files changed (1) hide show
  1. README.md +56 -0
README.md ADDED
@@ -0,0 +1,56 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ license: apache-2.0
3
+ language:
4
+ - fr
5
+ - it
6
+ - de
7
+ - es
8
+ - en
9
+ tags:
10
+ - moe
11
+ - mixtral
12
+ - sharegpt
13
+ - axolotl
14
+ library_name: transformers
15
+ base_model: v2ray/Mixtral-8x22B-v0.2
16
+ inference: false
17
+ model_creator: MaziyarPanahi
18
+ model_name: Goku-8x22B-v0.2
19
+ pipeline_tag: text-generation
20
+ quantized_by: MaziyarPanahi
21
+ datasets:
22
+ - microsoft/orca-math-word-problems-200k
23
+ - teknium/OpenHermes-2.5
24
+ ---
25
+
26
+ <img src="./Goku-8x22b-v0.1.webp" alt="Goku 8x22B v0.1 Logo" width="500" style="margin-left:'auto' margin-right:'auto' display:'block'"/>
27
+
28
+ # Goku-8x22B-v0.2 (Goku 141b-A35b)
29
+
30
+ A fine-tuned version of [v2ray/Mixtral-8x22B-v0.2](https://huggingface.co/v2ray/Mixtral-8x22B-v0.2) model on the following datasets:
31
+
32
+ - teknium/OpenHermes-2.5
33
+ - WizardLM/WizardLM_evol_instruct_V2_196k
34
+ - microsoft/orca-math-word-problems-200k
35
+
36
+ This model has a total of 141b parameters with 35b only active. The major difference in this version is that the model was trained on more datasets and with an `8192 sequence length`. This results in the model being able to generate longer and more coherent responses.
37
+
38
+
39
+ ## How to use it
40
+
41
+
42
+ **Use a pipeline as a high-level helper:**
43
+ ```python
44
+ from transformers import pipeline
45
+
46
+ pipe = pipeline("text-generation", model="MaziyarPanahi/Goku-8x22B-v0.2")
47
+ ```
48
+
49
+ **Load model directly:**
50
+ ```python
51
+
52
+ from transformers import AutoTokenizer, AutoModelForCausalLM
53
+
54
+ tokenizer = AutoTokenizer.from_pretrained("MaziyarPanahi/Goku-8x22B-v0.2")
55
+ model = AutoModelForCausalLM.from_pretrained("MaziyarPanahi/Goku-8x22B-v0.2")
56
+ ```