Model save

Files changed (5) hide show

README.md CHANGED Viewed

@@ -1,14 +1,12 @@
 ---
 license: gemma
-library_name: peft
 tags:
-- alignment-handbook
 - trl
 - sft
 - generated_from_trainer
-base_model: google/gemma-7b
 datasets:
-- chansung/mental_health_counseling_conversations_merged
 model-index:
 - name: mental_health_counseling_merged_v0.1
   results: []
@@ -19,9 +17,9 @@ should probably proofread and complete it, then remove this comment. -->
 # mental_health_counseling_merged_v0.1
-This model is a fine-tuned version of [google/gemma-7b](https://huggingface.co/google/gemma-7b) on the chansung/mental_health_counseling_conversations_merged dataset.
 It achieves the following results on the evaluation set:
-- Loss: 0.8574
 ## Model description
@@ -45,26 +43,29 @@ The following hyperparameters were used during training:
 - eval_batch_size: 2
 - seed: 42
 - distributed_type: multi-GPU
-- num_devices: 2
 - gradient_accumulation_steps: 2
-- total_train_batch_size: 8
-- total_eval_batch_size: 4
 - optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
 - lr_scheduler_type: cosine
 - lr_scheduler_warmup_ratio: 0.1
-- num_epochs: 1
 ### Training results
-| Training Loss | Epoch  | Step  | Validation Loss |
-|:-------------:|:------:|:-----:|:---------------:|
-| 0.8429        | 1.0000 | 10673 | 0.8574          |
 ### Framework versions
-- PEFT 0.11.0
-- Transformers 4.40.2
 - Pytorch 2.3.0+cu121
 - Datasets 2.19.1
-- Tokenizers 0.19.1

 ---
 license: gemma
+base_model: google/gemma-7b
 tags:
 - trl
 - sft
 - generated_from_trainer
 datasets:
+- generator
 model-index:
 - name: mental_health_counseling_merged_v0.1
   results: []
 # mental_health_counseling_merged_v0.1
+This model is a fine-tuned version of [google/gemma-7b](https://huggingface.co/google/gemma-7b) on the generator dataset.
 It achieves the following results on the evaluation set:
+- Loss: 0.8529
 ## Model description
 - eval_batch_size: 2
 - seed: 42
 - distributed_type: multi-GPU
+- num_devices: 6
 - gradient_accumulation_steps: 2
+- total_train_batch_size: 24
+- total_eval_batch_size: 12
 - optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
 - lr_scheduler_type: cosine
 - lr_scheduler_warmup_ratio: 0.1
+- num_epochs: 5
 ### Training results
+| Training Loss | Epoch | Step  | Validation Loss |
+|:-------------:|:-----:|:-----:|:---------------:|
+| 0.8916        | 1.0   | 3558  | 0.8929          |
+| 0.8691        | 2.0   | 7116  | 0.8701          |
+| 0.8141        | 3.0   | 10674 | 0.8574          |
+| 0.838         | 4.0   | 14232 | 0.8494          |
+| 0.7613        | 5.0   | 17790 | 0.8529          |
 ### Framework versions
+- Transformers 4.41.1
 - Pytorch 2.3.0+cu121
 - Datasets 2.19.1
+- Tokenizers 0.19.1

all_results.json CHANGED Viewed

@@ -1,14 +1,9 @@
 {
-    "epoch": 0.9999531550100716,
-    "eval_loss": 0.8574451804161072,
-    "eval_runtime": 962.2866,
-    "eval_samples": 5130,
-    "eval_samples_per_second": 4.673,
-    "eval_steps_per_second": 1.169,
-    "total_flos": 4.092531867595571e+18,
-    "train_loss": 1.0194565681556524,
-    "train_runtime": 58622.8041,
     "train_samples": 97468,
-    "train_samples_per_second": 1.457,
-    "train_steps_per_second": 0.182
 }

 {
+    "epoch": 5.0,
+    "total_flos": 2.1191156679948894e+19,
+    "train_loss": 0.9175008542622121,
+    "train_runtime": 172180.6812,
     "train_samples": 97468,
+    "train_samples_per_second": 2.48,
+    "train_steps_per_second": 0.103
 }

runs/May27_11-46-06_instance-20240524-1004/events.out.tfevents.1716825222.instance-20240524-1004.480.0 CHANGED Viewed

@@ -1,3 +1,3 @@
 version https://git-lfs.github.com/spec/v1
-oid sha256:c593ff77426af954c945642b50213c6c9ce36a464b5504737042a4b0c2ec4a57
-size 754818

 version https://git-lfs.github.com/spec/v1
+oid sha256:8dae9f2ebca092a8f83105724bebdf96b427e06bd06795dfc21255295902a6b1
+size 759324

train_results.json CHANGED Viewed

@@ -1,9 +1,9 @@
 {
-    "epoch": 0.9999531550100716,
-    "total_flos": 4.092531867595571e+18,
-    "train_loss": 1.0194565681556524,
-    "train_runtime": 58622.8041,
     "train_samples": 97468,
-    "train_samples_per_second": 1.457,
-    "train_steps_per_second": 0.182
 }

 {
+    "epoch": 5.0,
+    "total_flos": 2.1191156679948894e+19,
+    "train_loss": 0.9175008542622121,
+    "train_runtime": 172180.6812,
     "train_samples": 97468,
+    "train_samples_per_second": 2.48,
+    "train_steps_per_second": 0.103
 }

trainer_state.json CHANGED Viewed

The diff for this file is too large to render. See raw diff