See axolotl config
axolotl version: 0.4.0
# use google/gemma-7b if you have access
base_model: google/gemma-7b
model_type: AutoModelForCausalLM
tokenizer_type: AutoTokenizer
load_in_8bit: false
load_in_4bit: true
strict: false
# huggingface repo
datasets:
- path: ./dataset/data1.jsonl
type: input_output
val_set_size: 0.1
output_dir: ./gemma-python
adapter: qlora
lora_r: 32
lora_alpha: 16
lora_dropout: 0.05
lora_target_linear: true
sequence_len: 4096
sample_packing: false
pad_to_sequence_len: true
wandb_project:
wandb_entity:
wandb_watch:
wandb_name:
wandb_log_model:
gradient_accumulation_steps: 3
micro_batch_size: 2
num_epochs: 10
optimizer: adamw_bnb_8bit
lr_scheduler: cosine
learning_rate: 0.0002
train_on_inputs: false
group_by_length: false
bf16: auto
fp16:
tf32: false
gradient_checkpointing: true
early_stopping_patience:
resume_from_checkpoint:
local_rank:
logging_steps: 1
xformers_attention:
flash_attention: true
warmup_ratio: 0.1
evals_per_epoch: 4
eval_table_size:
eval_max_new_tokens: 128
saves_per_epoch: 1
debug:
deepspeed: deepspeed_configs/zero1.json
weight_decay: 0.0
fsdp:
fsdp_config:
special_tokens:
gemma-python
This model is a fine-tuned version of google/gemma-7b on the None dataset. It achieves the following results on the evaluation set:
- Loss: 2.1143
Model description
More information needed
Intended uses & limitations
More information needed
Training and evaluation data
More information needed
Training procedure
Training hyperparameters
The following hyperparameters were used during training:
- learning_rate: 0.0002
- train_batch_size: 2
- eval_batch_size: 2
- seed: 42
- distributed_type: multi-GPU
- num_devices: 4
- gradient_accumulation_steps: 3
- total_train_batch_size: 24
- total_eval_batch_size: 8
- optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
- lr_scheduler_type: cosine
- lr_scheduler_warmup_steps: 2
- num_epochs: 10
Training results
Training Loss | Epoch | Step | Validation Loss |
---|---|---|---|
19.0016 | 0.12 | 1 | 18.6992 |
19.4686 | 0.25 | 2 | 16.2578 |
11.468 | 0.5 | 4 | 8.2891 |
7.5305 | 0.75 | 6 | 5.8847 |
5.7572 | 1.0 | 8 | 4.3635 |
4.3903 | 1.25 | 10 | 3.2849 |
2.9497 | 1.5 | 12 | 2.8539 |
2.8738 | 1.75 | 14 | 2.6203 |
2.7298 | 2.0 | 16 | 2.4534 |
2.4284 | 2.25 | 18 | 2.3077 |
2.394 | 2.5 | 20 | 2.1876 |
2.069 | 2.75 | 22 | 2.1294 |
1.9355 | 3.0 | 24 | 2.1048 |
1.9635 | 3.25 | 26 | 2.0707 |
2.092 | 3.5 | 28 | 2.0596 |
1.9675 | 3.75 | 30 | 2.0287 |
1.9693 | 4.0 | 32 | 2.0220 |
2.0198 | 4.25 | 34 | 2.0124 |
1.9357 | 4.5 | 36 | 1.9946 |
1.8147 | 4.75 | 38 | 1.9979 |
1.9084 | 5.0 | 40 | 1.9751 |
1.6678 | 5.25 | 42 | 2.0049 |
1.7639 | 5.5 | 44 | 1.9885 |
1.7475 | 5.75 | 46 | 1.9777 |
1.4848 | 6.0 | 48 | 1.9939 |
1.3065 | 6.25 | 50 | 2.0264 |
1.4792 | 6.5 | 52 | 2.0125 |
1.4233 | 6.75 | 54 | 2.0204 |
1.2534 | 7.0 | 56 | 2.0318 |
1.2409 | 7.25 | 58 | 2.0445 |
1.4309 | 7.5 | 60 | 2.0641 |
1.1622 | 7.75 | 62 | 2.0633 |
1.228 | 8.0 | 64 | 2.0930 |
1.3076 | 8.25 | 66 | 2.1077 |
1.2323 | 8.5 | 68 | 2.1060 |
1.1635 | 8.75 | 70 | 2.1039 |
1.261 | 9.0 | 72 | 2.1068 |
1.0122 | 9.25 | 74 | 2.1110 |
1.218 | 9.5 | 76 | 2.1180 |
1.1022 | 9.75 | 78 | 2.1226 |
1.2072 | 10.0 | 80 | 2.1143 |
Framework versions
- PEFT 0.9.0
- Transformers 4.38.2
- Pytorch 2.2.1
- Datasets 2.18.0
- Tokenizers 0.15.0
- Downloads last month
- 3
Model tree for dvdmrs09/gemma-py2
Base model
google/gemma-7b