See axolotl config
axolotl version: 0.4.1
adapter: lora
base_model: unsloth/gemma-2b-it
batch_size: 8
bf16: true
chat_template: tokenizer_default_fallback_alpaca
datasets:
- data_files:
- 88cfea977fe74782_train_data.json
ds_type: json
format: custom
path: /workspace/input_data/88cfea977fe74782_train_data.json
type:
field_instruction: smiles
field_output: molt5
format: '{instruction}'
no_input_format: '{instruction}'
system_format: '{system}'
system_prompt: ''
evals_per_epoch: 1
flash_attention: true
gpu_memory_limit: 80GiB
gradient_checkpointing: true
group_by_length: true
hub_model_id: willtensora/f0d6caa9-89a9-4666-9a6d-c8cda2015281
hub_strategy: checkpoint
learning_rate: 0.0002
logging_steps: 10
lora_alpha: 256
lora_dropout: 0.1
lora_r: 128
lora_target_linear: true
lr_scheduler: cosine
micro_batch_size: 1
model_type: AutoModelForCausalLM
num_epochs: 100
optimizer: adamw_bnb_8bit
output_dir: miner_id_24
pad_to_sequence_len: true
resize_token_embeddings_to_32x: false
sample_packing: false
saves_per_epoch: 2
sequence_len: 2048
tokenizer_type: GemmaTokenizerFast
train_on_inputs: false
trust_remote_code: true
val_set_size: 0.1
wandb_entity: ''
wandb_mode: online
wandb_project: Gradients-On-Demand
wandb_run: your_name
wandb_runid: default
warmup_ratio: 0.05
xformers_attention: true
f0d6caa9-89a9-4666-9a6d-c8cda2015281
This model is a fine-tuned version of unsloth/gemma-2b-it on the None dataset. It achieves the following results on the evaluation set:
- Loss: 1.9427
Model description
More information needed
Intended uses & limitations
More information needed
Training and evaluation data
More information needed
Training procedure
Training hyperparameters
The following hyperparameters were used during training:
- learning_rate: 0.0002
- train_batch_size: 1
- eval_batch_size: 1
- seed: 42
- distributed_type: multi-GPU
- num_devices: 8
- total_train_batch_size: 8
- total_eval_batch_size: 8
- optimizer: Use OptimizerNames.ADAMW_BNB with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
- lr_scheduler_type: cosine
- lr_scheduler_warmup_steps: 12
- num_epochs: 100
Training results
Training Loss | Epoch | Step | Validation Loss |
---|---|---|---|
No log | 0.05 | 1 | 3.5172 |
1.0375 | 1.0 | 20 | 0.9186 |
0.5932 | 2.0 | 40 | 0.9190 |
0.4433 | 3.0 | 60 | 0.9756 |
0.3115 | 4.0 | 80 | 0.9780 |
0.2432 | 5.0 | 100 | 1.0348 |
0.219 | 6.0 | 120 | 1.1386 |
0.1868 | 7.0 | 140 | 1.0399 |
0.1624 | 8.0 | 160 | 1.2174 |
0.2109 | 9.0 | 180 | 1.1489 |
0.1223 | 10.0 | 200 | 1.2047 |
0.1149 | 11.0 | 220 | 1.2123 |
0.114 | 12.0 | 240 | 1.2854 |
0.0914 | 13.0 | 260 | 1.3633 |
0.0823 | 14.0 | 280 | 1.2355 |
0.0901 | 15.0 | 300 | 1.2453 |
0.093 | 16.0 | 320 | 1.3146 |
0.077 | 17.0 | 340 | 1.4159 |
0.0797 | 18.0 | 360 | 1.3376 |
0.0839 | 19.0 | 380 | 1.4419 |
0.0506 | 20.0 | 400 | 1.3841 |
0.0582 | 21.0 | 420 | 1.3847 |
0.0644 | 22.0 | 440 | 1.3697 |
0.0524 | 23.0 | 460 | 1.4068 |
0.0602 | 24.0 | 480 | 1.3840 |
0.0597 | 25.0 | 500 | 1.4276 |
0.0371 | 26.0 | 520 | 1.5041 |
0.0448 | 27.0 | 540 | 1.4607 |
0.0494 | 28.0 | 560 | 1.4608 |
0.042 | 29.0 | 580 | 1.5975 |
0.0334 | 30.0 | 600 | 1.4700 |
0.0403 | 31.0 | 620 | 1.5470 |
0.043 | 32.0 | 640 | 1.5968 |
0.0349 | 33.0 | 660 | 1.5662 |
0.0412 | 34.0 | 680 | 1.6331 |
0.0263 | 35.0 | 700 | 1.6191 |
0.0249 | 36.0 | 720 | 1.6646 |
0.0365 | 37.0 | 740 | 1.4995 |
0.0176 | 38.0 | 760 | 1.7255 |
0.0426 | 39.0 | 780 | 1.5561 |
0.0174 | 40.0 | 800 | 1.6246 |
0.0259 | 41.0 | 820 | 1.7055 |
0.0182 | 42.0 | 840 | 1.6314 |
0.013 | 43.0 | 860 | 1.5924 |
0.0194 | 44.0 | 880 | 1.7000 |
0.0194 | 45.0 | 900 | 1.6371 |
0.0171 | 46.0 | 920 | 1.7760 |
0.0094 | 47.0 | 940 | 1.7117 |
0.0061 | 48.0 | 960 | 1.7486 |
0.004 | 49.0 | 980 | 1.7964 |
0.003 | 50.0 | 1000 | 1.8029 |
0.0047 | 51.0 | 1020 | 1.7653 |
0.0033 | 52.0 | 1040 | 1.7602 |
0.0028 | 53.0 | 1060 | 1.7846 |
0.0091 | 54.0 | 1080 | 1.7363 |
0.0009 | 55.0 | 1100 | 1.7427 |
0.0005 | 56.0 | 1120 | 1.7763 |
0.0003 | 57.0 | 1140 | 1.8004 |
0.0004 | 58.0 | 1160 | 1.8191 |
0.0004 | 59.0 | 1180 | 1.8343 |
0.0004 | 60.0 | 1200 | 1.8433 |
0.0002 | 61.0 | 1220 | 1.8534 |
0.0003 | 62.0 | 1240 | 1.8619 |
0.0003 | 63.0 | 1260 | 1.8702 |
0.0002 | 64.0 | 1280 | 1.8774 |
0.0002 | 65.0 | 1300 | 1.8829 |
0.0003 | 66.0 | 1320 | 1.8894 |
0.0003 | 67.0 | 1340 | 1.8937 |
0.0001 | 68.0 | 1360 | 1.8985 |
0.0001 | 69.0 | 1380 | 1.9014 |
0.0003 | 70.0 | 1400 | 1.9057 |
0.0 | 71.0 | 1420 | 1.9103 |
0.0001 | 72.0 | 1440 | 1.9126 |
0.0003 | 73.0 | 1460 | 1.9165 |
0.0002 | 74.0 | 1480 | 1.9191 |
0.0002 | 75.0 | 1500 | 1.9210 |
0.0003 | 76.0 | 1520 | 1.9238 |
0.0001 | 77.0 | 1540 | 1.9273 |
0.0002 | 78.0 | 1560 | 1.9279 |
0.0002 | 79.0 | 1580 | 1.9301 |
0.0002 | 80.0 | 1600 | 1.9313 |
0.0003 | 81.0 | 1620 | 1.9321 |
0.0001 | 82.0 | 1640 | 1.9346 |
0.0 | 83.0 | 1660 | 1.9355 |
0.0004 | 84.0 | 1680 | 1.9356 |
0.0 | 85.0 | 1700 | 1.9385 |
0.0003 | 86.0 | 1720 | 1.9385 |
0.0001 | 87.0 | 1740 | 1.9396 |
0.0002 | 88.0 | 1760 | 1.9398 |
0.0001 | 89.0 | 1780 | 1.9407 |
0.0001 | 90.0 | 1800 | 1.9418 |
0.0002 | 91.0 | 1820 | 1.9418 |
0.0002 | 92.0 | 1840 | 1.9414 |
0.0003 | 93.0 | 1860 | 1.9418 |
0.0 | 94.0 | 1880 | 1.9427 |
0.0002 | 95.0 | 1900 | 1.9436 |
0.0003 | 96.0 | 1920 | 1.9425 |
0.0002 | 97.0 | 1940 | 1.9429 |
0.0003 | 98.0 | 1960 | 1.9430 |
0.0001 | 99.0 | 1980 | 1.9433 |
0.0002 | 100.0 | 2000 | 1.9427 |
Framework versions
- PEFT 0.13.2
- Transformers 4.46.0
- Pytorch 2.5.0+cu124
- Datasets 3.0.1
- Tokenizers 0.20.1
- Downloads last month
- 10
Model tree for willtensora/f0d6caa9-89a9-4666-9a6d-c8cda2015281
Base model
unsloth/gemma-2b-it