End of training

Files changed (5) hide show

README.md CHANGED Viewed

@@ -7,18 +7,16 @@ tags:
 - sft
 - generated_from_trainer
 model-index:
-- name: llama-3.1-8B-instruct-gsm8k-mine
   results: []
 ---
 <!-- This model card has been generated automatically according to the information the Trainer had access to. You
 should probably proofread and complete it, then remove this comment. -->
-# llama-3.1-8B-instruct-gsm8k-mine
 This model is a fine-tuned version of [meta-llama/Llama-3.1-8B-Instruct](https://huggingface.co/meta-llama/Llama-3.1-8B-Instruct) on an unknown dataset.
-It achieves the following results on the evaluation set:
-- Loss: 1.2546
 ## Model description
@@ -37,25 +35,21 @@ More information needed
 ### Training hyperparameters
 The following hyperparameters were used during training:
-- learning_rate: 5e-05
 - train_batch_size: 4
-- eval_batch_size: 4
-- seed: 42
 - distributed_type: multi-GPU
 - num_devices: 8
-- gradient_accumulation_steps: 8
-- total_train_batch_size: 256
-- total_eval_batch_size: 32
 - optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
 - lr_scheduler_type: linear
-- lr_scheduler_warmup_ratio: 0.03
-- num_epochs: 3
 ### Training results
-| Training Loss | Epoch  | Step | Validation Loss |
-|:-------------:|:------:|:----:|:---------------:|
-| 1.2563        | 1.7094 | 50   | 1.2546          |
 ### Framework versions

 - sft
 - generated_from_trainer
 model-index:
+- name: llama-3.1-8b-instruct-gsm8k-mine
   results: []
 ---
 <!-- This model card has been generated automatically according to the information the Trainer had access to. You
 should probably proofread and complete it, then remove this comment. -->
+# llama-3.1-8b-instruct-gsm8k-mine
 This model is a fine-tuned version of [meta-llama/Llama-3.1-8B-Instruct](https://huggingface.co/meta-llama/Llama-3.1-8B-Instruct) on an unknown dataset.
 ## Model description
 ### Training hyperparameters
 The following hyperparameters were used during training:
+- learning_rate: 0.0001
 - train_batch_size: 4
+- eval_batch_size: 8
+- seed: 3407
 - distributed_type: multi-GPU
 - num_devices: 8
+- total_train_batch_size: 32
+- total_eval_batch_size: 64
 - optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
 - lr_scheduler_type: linear
+- lr_scheduler_warmup_steps: 64
+- num_epochs: 6
 ### Training results
 ### Framework versions

adapter_config.json CHANGED Viewed

@@ -16,7 +16,7 @@
   "megatron_core": "megatron.core",
   "modules_to_save": null,
   "peft_type": "LORA",
-  "r": 16,
   "rank_pattern": {},
   "revision": null,
   "target_modules": [

   "megatron_core": "megatron.core",
   "modules_to_save": null,
   "peft_type": "LORA",
+  "r": 8,
   "rank_pattern": {},
   "revision": null,
   "target_modules": [

adapter_model.safetensors CHANGED Viewed

@@ -1,3 +1,3 @@
 version https://git-lfs.github.com/spec/v1
-oid sha256:ce4214d70ad30f80b97e191589c16540c0e53264420498a102d39583aa378e0f
-size 27280152

 version https://git-lfs.github.com/spec/v1
+oid sha256:740b95beed6b745cda04a6c7a42adabb141dd256b1248464ae0ca0f7ff171cb0
+size 13648432

tokenizer_config.json CHANGED Viewed

@@ -2067,6 +2067,5 @@
   ],
   "model_max_length": 131072,
   "pad_token": "<|eot_id|>",
-  "padding_side": "left",
   "tokenizer_class": "PreTrainedTokenizerFast"
 }

   ],
   "model_max_length": 131072,
   "pad_token": "<|eot_id|>",
   "tokenizer_class": "PreTrainedTokenizerFast"
 }

training_args.bin CHANGED Viewed

@@ -1,3 +1,3 @@
 version https://git-lfs.github.com/spec/v1
-oid sha256:e00d16cf2c63c1657a60457a6d0954f8216d22499ce6c4749a399668fe0b24bf
 size 5496

 version https://git-lfs.github.com/spec/v1
+oid sha256:ce571a1fb008b2a3b5ca7a52d2c347744e9009014882a6ec07f28378b7a88eac
 size 5496