--- license: apache-2.0 language: - fr - it - de - es - en tags: - moe - mixtral - sharegpt - axolotl library_name: transformers base_model: v2ray/Mixtral-8x22B-v0.1 inference: false model_creator: MaziyarPanahi model_name: Goku-8x22B-v0.2 pipeline_tag: text-generation quantized_by: MaziyarPanahi datasets: - MaziyarPanahi/WizardLM_evol_instruct_V2_196k - microsoft/orca-math-word-problems-200k - teknium/OpenHermes-2.5 --- Goku 8x22B v0.1 Logo # Goku-8x22B-v0.2 (Goku 141b-A35b) A fine-tuned version of [v2ray/Mixtral-8x22B-v0.1](https://huggingface.co/v2ray/Mixtral-8x22B-v0.1) model on the following datasets: - teknium/OpenHermes-2.5 - WizardLM/WizardLM_evol_instruct_V2_196k - microsoft/orca-math-word-problems-200k This model has a total of 141b parameters with 35b only active. The major difference in this version is that the model was trained on more datasets and with an `8192 sequence length`. This results in the model being able to generate longer and more coherent responses. ## How to use it **Use a pipeline as a high-level helper:** ```python from transformers import pipeline pipe = pipeline("text-generation", model="MaziyarPanahi/Goku-8x22B-v0.2") ``` **Load model directly:** ```python from transformers import AutoTokenizer, AutoModelForCausalLM tokenizer = AutoTokenizer.from_pretrained("MaziyarPanahi/Goku-8x22B-v0.2") model = AutoModelForCausalLM.from_pretrained("MaziyarPanahi/Goku-8x22B-v0.2") ```