Edit model card

Air-Striker-Mixtral-8x7B-Instruct-ZLoss

Experimental model, trained using config and Transformers/Axolotl forks provided by Doctor-Shotgun

Model was fine-tuned from Mixtral-8x7B-v0.1 with airoboros-3.2 dataset, for 4 epochs, ChatML prompt format at 8K context length.

Additionally, model was then merged with Mixtral-8x7B-Instruct-v0.1:


This is a merge of pre-trained language models created using mergekit.

Merge Details

Merge Method

This model was merged using the linear merge method.

Models Merged

The following models were included in the merge:

  • mistralai/Mixtral-8x7B-Instruct-v0.1
  • LoneStriker/Air-Striker-Mixtral-8x7B-ZLoss

Configuration

The following YAML configuration was used to produce this model:

models:
  - model: mistralai/Mixtral-8x7B-Instruct-v0.1
    parameters:
      weight: 0.5
  - model: LoneStriker/Air-Striker-Mixtral-8x7B-ZLoss
    parameters:
      weight: 0.5
merge_method: linear
dtype: bfloat16
Downloads last month
377
GGUF
Model size
46.7B params
Architecture
llama

2-bit

3-bit

4-bit

5-bit

6-bit

Inference Examples
Inference API (serverless) has been turned off for this model.

Dataset used to train LoneStriker/Air-Striker-Mixtral-8x7B-Instruct-ZLoss-GGUF