See axolotl config
axolotl version: 0.4.1
base_model: mistralai/Mistral-7B-Instruct-v0.1
model_type: AutoModelForCausalLM
tokenizer_type: AutoTokenizer
load_in_8bit: false
load_in_4bit: true
strict: false
chat_template: chatml
datasets:
- path: Howard881010/gas-2_week-mixed-mixed-fact
type: alpaca
train_on_split: train
dataset_prepared_path:
val_set_size: 0.05
output_dir: ./finetune/outputs/gas-2_week-mixed-mixed-fact
adapter: qlora
lora_model_dir:
sequence_len: 1000
sample_packing: false
pad_to_sequence_len: true
lora_r: 32
lora_alpha: 16
lora_dropout: 0.05
lora_target_modules:
lora_target_linear: true
lora_fan_in_fan_out:
wandb_project: finetune
wandb_entity:
wandb_watch:
wandb_name: gas-2_week-mixed-mixed-fact
wandb_log_model:
gradient_accumulation_steps: 2
micro_batch_size: 1
num_epochs: 10
optimizer: paged_adamw_32bit
lr_scheduler: cosine
learning_rate: 0.0002
train_on_inputs: false
group_by_length: false
bf16: auto
fp16:
tf32: false
gradient_checkpointing: true
early_stopping_patience:
resume_from_checkpoint:
local_rank:
logging_steps: 1
xformers_attention:
flash_attention: true
eval_sample_packing: False
warmup_steps: 10
evals_per_epoch: 4
eval_table_size:
saves_per_epoch: 1
debug:
deepspeed:
weight_decay: 0.0
fsdp:
fsdp_config:
# For finetune
seed: 42
finetune/outputs/gas-2_week-mixed-mixed-fact
This model is a fine-tuned version of mistralai/Mistral-7B-Instruct-v0.1 on the None dataset. It achieves the following results on the evaluation set:
- Loss: 0.7440
Model description
More information needed
Intended uses & limitations
More information needed
Training and evaluation data
More information needed
Training procedure
Training hyperparameters
The following hyperparameters were used during training:
- learning_rate: 0.0002
- train_batch_size: 1
- eval_batch_size: 1
- seed: 42
- distributed_type: multi-GPU
- num_devices: 8
- gradient_accumulation_steps: 2
- total_train_batch_size: 16
- total_eval_batch_size: 8
- optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
- lr_scheduler_type: cosine
- lr_scheduler_warmup_steps: 10
- num_epochs: 10
Training results
Training Loss | Epoch | Step | Validation Loss |
---|---|---|---|
0.9232 | 0.0163 | 1 | 0.5171 |
0.6942 | 0.2602 | 16 | 0.4129 |
0.6473 | 0.5203 | 32 | 0.3971 |
0.494 | 0.7805 | 48 | 0.3909 |
0.4503 | 1.0407 | 64 | 0.4059 |
0.4591 | 1.3008 | 80 | 0.4198 |
0.3291 | 1.5610 | 96 | 0.4314 |
0.2746 | 1.8211 | 112 | 0.4386 |
0.1558 | 2.0813 | 128 | 0.4951 |
0.152 | 2.3415 | 144 | 0.4898 |
0.1461 | 2.6016 | 160 | 0.5046 |
0.1392 | 2.8618 | 176 | 0.5184 |
0.0897 | 3.1220 | 192 | 0.5488 |
0.082 | 3.3821 | 208 | 0.5447 |
0.0757 | 3.6423 | 224 | 0.5675 |
0.0682 | 3.9024 | 240 | 0.5540 |
0.0329 | 4.1626 | 256 | 0.6219 |
0.0266 | 4.4228 | 272 | 0.6008 |
0.0334 | 4.6829 | 288 | 0.6219 |
0.0212 | 4.9431 | 304 | 0.6420 |
0.0088 | 5.2033 | 320 | 0.6808 |
0.0072 | 5.4634 | 336 | 0.6965 |
0.0047 | 5.7236 | 352 | 0.7108 |
0.0038 | 5.9837 | 368 | 0.7144 |
0.0043 | 6.2439 | 384 | 0.7254 |
0.0016 | 6.5041 | 400 | 0.7297 |
0.0006 | 6.7642 | 416 | 0.7357 |
0.0007 | 7.0244 | 432 | 0.7389 |
0.0012 | 7.2846 | 448 | 0.7410 |
0.0007 | 7.5447 | 464 | 0.7425 |
0.0008 | 7.8049 | 480 | 0.7435 |
0.0013 | 8.0650 | 496 | 0.7441 |
0.0007 | 8.3252 | 512 | 0.7437 |
0.0006 | 8.5854 | 528 | 0.7443 |
0.0006 | 8.8455 | 544 | 0.7444 |
0.0012 | 9.1057 | 560 | 0.7441 |
0.0008 | 9.3659 | 576 | 0.7441 |
0.0006 | 9.6260 | 592 | 0.7445 |
0.0008 | 9.8862 | 608 | 0.7440 |
Framework versions
- PEFT 0.11.1
- Transformers 4.40.2
- Pytorch 2.3.0+cu121
- Datasets 2.19.1
- Tokenizers 0.19.1
- Downloads last month
- 4
Unable to determine this model’s pipeline type. Check the
docs
.