See axolotl config

axolotl version: 0.4.1

base_model: mistralai/Mistral-7B-Instruct-v0.2
model_type: AutoModelForCausalLM
tokenizer_type: AutoTokenizer

load_in_8bit: false
load_in_4bit: true
strict: false

chat_template: chatml
datasets:
  - path: Howard881010/gas
    type: alpaca
    train_on_split: train
dataset_prepared_path:
val_set_size: 0.05
output_dir: ./finetune/outputs/gas

adapter: qlora
lora_model_dir:

sequence_len: 1200
sample_packing: false
pad_to_sequence_len: true

lora_r: 32
lora_alpha: 16
lora_dropout: 0.05
lora_target_modules:
lora_target_linear: true
lora_fan_in_fan_out:

wandb_project: finetune
wandb_entity:
wandb_watch:
wandb_name: gas
wandb_log_model:

gradient_accumulation_steps: 2
micro_batch_size: 1
num_epochs: 10
optimizer: paged_adamw_32bit
lr_scheduler: cosine
learning_rate: 0.0002

train_on_inputs: false
group_by_length: false
bf16: auto
fp16:
tf32: false

gradient_checkpointing: true
early_stopping_patience:
resume_from_checkpoint:
local_rank:
logging_steps: 1
xformers_attention: 
flash_attention: true
eval_sample_packing: False

warmup_steps: 10
evals_per_epoch: 4
eval_table_size:
saves_per_epoch: 1
debug:
deepspeed:
weight_decay: 0.0
fsdp:
fsdp_config:
# For finetune
seed: 42

finetune/outputs/gas

This model is a fine-tuned version of mistralai/Mistral-7B-Instruct-v0.2 on the None dataset. It achieves the following results on the evaluation set:

Loss: 0.0030

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

learning_rate: 0.0002
train_batch_size: 1
eval_batch_size: 1
seed: 42
distributed_type: multi-GPU
num_devices: 8
gradient_accumulation_steps: 2
total_train_batch_size: 16
total_eval_batch_size: 8
optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
lr_scheduler_type: cosine
lr_scheduler_warmup_steps: 10
num_epochs: 10

Training results

Training Loss	Epoch	Step	Validation Loss
1.2507	0.0022	1	0.9644
0.66	0.2508	113	0.5023
0.4941	0.5017	226	0.3771
0.2988	0.7525	339	0.2865
0.1595	1.0033	452	0.2219
0.0708	1.2542	565	0.1954
0.0752	1.5050	678	0.1740
0.1355	1.7558	791	0.1599
0.0907	2.0067	904	0.1532
0.0531	2.2575	1017	0.1488
0.074	2.5083	1130	0.1466
0.0556	2.7592	1243	0.1444
0.0504	3.0100	1356	0.1387
0.0536	3.2608	1469	0.1365
0.0259	3.5117	1582	0.1309
0.054	3.7625	1695	0.1223
0.0229	4.0133	1808	0.1133
0.0224	4.2642	1921	0.1059
0.0503	4.5150	2034	0.0921
0.0173	4.7658	2147	0.0732
0.0076	5.0166	2260	0.0531
0.0089	5.2675	2373	0.0362
0.0062	5.5183	2486	0.0240
0.0071	5.7691	2599	0.0155
0.0029	6.0200	2712	0.0082
0.002	6.2708	2825	0.0060
0.0021	6.5216	2938	0.0049
0.0011	6.7725	3051	0.0039
0.0005	7.0233	3164	0.0034
0.0001	7.2741	3277	0.0033
0.0001	7.5250	3390	0.0032
0.0004	7.7758	3503	0.0031
0.0005	8.0266	3616	0.0031
0.0008	8.2775	3729	0.0030
0.0002	8.5283	3842	0.0030
0.0002	8.7791	3955	0.0030
0.0004	9.0300	4068	0.0030
0.0006	9.2808	4181	0.0030
0.0005	9.5316	4294	0.0030
0.0006	9.7825	4407	0.0030

Framework versions

PEFT 0.11.1
Transformers 4.43.1
Pytorch 2.3.0+cu121
Datasets 2.19.1
Tokenizers 0.19.1

Rose-STL-Lab
/

gas

finetune/outputs/gas

Model description

Intended uses & limitations

Training and evaluation data

Training procedure

Training hyperparameters

Training results

Framework versions

Model tree for Rose-STL-Lab/gas

Evaluation results