This abalation underperforms the tried and true augmxnt/shisa-gamma-7b-v1 and if you're looking for a Mistral 7B based model, you should probably go with that.
Performance
Measured using a fork of Lightblue's Shaberi benchmark framework:
Model | Average | ELYZA-tasks-100 | MT-Bench | Rakuda | Tengu-Bench |
---|---|---|---|---|---|
gpt-4-turbo-2024-04-09 | 8.75 | 8.78 | 8.74 | 9.18 | 8.31 |
gpt-4o-2024-05-13 | 8.72 | 8.88 | 8.69 | 9.15 | 8.16 |
gemini-1.5-pro | 8.58 | 8.58 | 8.93 | 9.20 | 7.61 |
claude-3-opus-20240229 | 8.55 | 8.64 | 8.58 | 8.75 | 8.23 |
CohereForAI/c4ai-command-r-plus | 7.69 | 7.50 | 7.43 | 9.05 | 6.79 |
shisa-ai/shisa-v1-llama3-70b | 7.30 | 7.34 | 7.67 | 8.15 | 6.04 |
gpt-3.5-turbo-0125 | 7.17 | 7.24 | 6.98 | 7.64 | 6.82 |
shisa-ai/shisa-v1-llama3-70b.2e5 | 7.17 | 7.16 | 7.45 | 7.98 | 6.09 |
karakuri-ai/karakuri-lm-8x7b-chat-v0.1 | 7.00 | 7.18 | 6.30 | 7.98 | 6.55 |
karakuri-ai/karakuri-lm-70b-chat-v0.1 | 6.84 | 6.86 | 6.43 | 7.85 | 6.23 |
lightblue/ao-karasu-72B | 6.81 | 7.19 | 6.54 | 7.25 | 6.27 |
shisa-ai/shisa-v1-llama3-8b | 6.59 | 6.67 | 6.95 | 7.05 | 5.68 |
microsoft/Phi-3-medium-128k-instruct | 6.48 | 7.10 | 5.92 | 6.84 | 6.04 |
shisa-ai/shisa-swallowmx-13a47b-v1 | 6.17 | 6.48 | 6.07 | 7.11 | 5.03 |
lightblue/suzume-llama-3-8B-japanese | 5.96 | 6.68 | 4.96 | 6.68 | 5.53 |
augmxnt/shisa-gamma-7b-v1 | 5.82 | 5.96 | 5.02 | 6.85 | 5.47 |
shisa-ai/shisa-v1-phi3-14b | 5.77 | 6.28 | 5.26 | 6.55 | 5.01 |
shisa-ai/shisa-v1-gemma-8b | 5.64 | 6.50 | 5.42 | 5.10 | 5.55 |
Rakuten/RakutenAI-7B-chat | 5.58 | 5.92 | 4.60 | 6.58 | 5.24 |
lightblue/qarasu-14B-chat-plus-unleashed | 5.20 | 5.58 | 4.74 | 5.46 | 5.01 |
shisa-ai/shisa-v1-mistral0.3-7b | 5.11 | 5.64 | 6.10 | 3.83 | 4.86 |
cyberagent/calm2-7b-chat | 4.76 | 4.90 | 3.58 | 5.75 | 4.81 |
mistralai/Mistral-7B-Instruct-v0.2 | 4.69 | 5.78 | 4.65 | 3.80 | 4.53 |
shisa-ai/shisa-v1-yi1.5-9b | 4.63 | 5.98 | 4.28 | 3.26 | 5.00 |
augmxnt/shisa-7b-v1 | 4.50 | 4.63 | 3.95 | 4.89 | 4.53 |
See axolotl config
axolotl version: 0.4.0
base_model: mistralai/Mistral-7B-Instruct-v0.3
model_type: MistralForCausalLM
tokenizer_type: LlamaTokenizer
load_in_8bit: false
load_in_4bit: false
strict: false
chat_template: inst
datasets:
- path: augmxnt/ultra-orca-boros-en-ja-v1
type: sharegpt
dataset_prepared_path:
val_set_size: 0.05
output_dir: ./outputs/mistral
sequence_len: 8192
sample_packing: true
pad_to_sequence_len: true
eval_sample_packing: false
use_wandb: true
wandb_project: shisa-v2
wandb_entity: augmxnt
wandb_name: shisa-v1-mistral0.3-7b
gradient_accumulation_steps: 4
micro_batch_size: 2
num_epochs: 3
optimizer: paged_adamw_8bit
lr_scheduler: linear
learning_rate: 8e-6
train_on_inputs: false
group_by_length: false
bf16: auto
fp16:
tf32: false
gradient_checkpointing: true
early_stopping_patience:
resume_from_checkpoint:
local_rank:
logging_steps: 1
xformers_attention:
flash_attention: true
warmup_steps: 100
evals_per_epoch: 2
eval_table_size:
eval_max_new_tokens: 128
saves_per_epoch: 1
debug:
deepspeed: zero3_bf16.json
weight_decay: 0.0
fsdp:
fsdp_config:
special_tokens:
outputs/mistral
This model is a fine-tuned version of mistralai/Mistral-7B-Instruct-v0.3 on the None dataset. It achieves the following results on the evaluation set:
- Loss: 0.3791
Model description
More information needed
Intended uses & limitations
More information needed
Training and evaluation data
More information needed
Training procedure
Training hyperparameters
The following hyperparameters were used during training:
- learning_rate: 8e-06
- train_batch_size: 2
- eval_batch_size: 2
- seed: 42
- distributed_type: multi-GPU
- num_devices: 8
- gradient_accumulation_steps: 4
- total_train_batch_size: 64
- total_eval_batch_size: 16
- optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
- lr_scheduler_type: linear
- lr_scheduler_warmup_steps: 100
- num_epochs: 3
Training results
Training Loss | Epoch | Step | Validation Loss |
---|---|---|---|
0.8564 | 0.0045 | 1 | 0.7107 |
0.6131 | 0.5023 | 111 | 0.4259 |
0.6077 | 1.0045 | 222 | 0.3715 |
0.4173 | 1.4932 | 333 | 0.3617 |
0.3812 | 1.9955 | 444 | 0.3468 |
0.2408 | 2.4842 | 555 | 0.3791 |
Framework versions
- Transformers 4.40.2
- Pytorch 2.3.0+cu121
- Datasets 2.19.1
- Tokenizers 0.19.1
- Downloads last month
- 4
This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social
visibility and check back later, or deploy to Inference Endpoints (dedicated)
instead.
Model tree for shisa-ai/shisa-v1-mistral0.3-7b
Base model
mistralai/Mistral-7B-v0.3
Finetuned
mistralai/Mistral-7B-Instruct-v0.3
Finetuned
this model