nongfuyulang
/

engineer-heavy-500k-barc-llama3.1-8b-ins-fft-induction_lr1e-5_epoch3

@@ -1,20 +1,11 @@
 ---
 library_name: transformers
-license: llama3.2
-base_model: meta-llama/Llama-3.2-1B-Instruct
 tags:
-- alignment-handbook
 - trl
 - sft
 - generated_from_trainer
-- trl
-- sft
-- generated_from_trainer
-datasets:
-- barc0/induction_heavy_100k_jsonl
-- barc0/induction_heavy_suggestfunction_100k_jsonl
-- barc0/induction_100k_gpt4o-mini_generated_problems_seed100.jsonl_messages_format_0.3
-- barc0/induction_100k-gpt4-description-gpt4omini-code_generated_problems_messages_format_0.3
 model-index:
 - name: engineer-heavy-500k-barc-llama3.1-8b-ins-fft-induction_lr1e-5_epoch3
   results: []
@@ -25,9 +16,9 @@ should probably proofread and complete it, then remove this comment. -->
 # engineer-heavy-500k-barc-llama3.1-8b-ins-fft-induction_lr1e-5_epoch3
-This model is a fine-tuned version of [meta-llama/Llama-3.2-1B-Instruct](https://huggingface.co/meta-llama/Llama-3.2-1B-Instruct) on the barc0/induction_heavy_100k_jsonl, the barc0/induction_heavy_suggestfunction_100k_jsonl, the barc0/induction_100k_gpt4o-mini_generated_problems_seed100.jsonl_messages_format_0.3 and the barc0/induction_100k-gpt4-description-gpt4omini-code_generated_problems_messages_format_0.3 datasets.
 It achieves the following results on the evaluation set:
-- Loss: 0.2990
 ## Model description
@@ -63,8 +54,8 @@ The following hyperparameters were used during training:
 | Training Loss | Epoch | Step | Validation Loss |
 |:-------------:|:-----:|:----:|:---------------:|
-| 0.3106        | 1.0   | 2995 | 0.3114          |
-| 0.2792        | 2.0   | 5990 | 0.2990          |
 ### Framework versions

 ---
 library_name: transformers
+license: llama3.1
+base_model: meta-llama/Meta-Llama-3.1-8B-Instruct
 tags:
 - trl
 - sft
 - generated_from_trainer
 model-index:
 - name: engineer-heavy-500k-barc-llama3.1-8b-ins-fft-induction_lr1e-5_epoch3
   results: []
 # engineer-heavy-500k-barc-llama3.1-8b-ins-fft-induction_lr1e-5_epoch3
+This model is a fine-tuned version of [meta-llama/Meta-Llama-3.1-8B-Instruct](https://huggingface.co/meta-llama/Meta-Llama-3.1-8B-Instruct) on the None dataset.
 It achieves the following results on the evaluation set:
+- Loss: 0.2710
 ## Model description
 | Training Loss | Epoch | Step | Validation Loss |
 |:-------------:|:-----:|:----:|:---------------:|
+| 0.2797        | 1.0   | 2995 | 0.2817          |
+| 0.2389        | 2.0   | 5990 | 0.2710          |
 ### Framework versions

all_results.json CHANGED Viewed

@@ -1,14 +1,9 @@
 {
     "epoch": 2.0,
-    "eval_loss": 0.29904305934906006,
-    "eval_runtime": 87.7661,
-    "eval_samples": 20178,
-    "eval_samples_per_second": 229.907,
-    "eval_steps_per_second": 1.8,
-    "total_flos": 525762547179520.0,
-    "train_loss": 0.32578817207447075,
-    "train_runtime": 11617.078,
     "train_samples": 383339,
-    "train_samples_per_second": 65.996,
-    "train_steps_per_second": 0.516
 }

 {
     "epoch": 2.0,
+    "total_flos": 2071185792860160.0,
+    "train_loss": 0.2831706563921922,
+    "train_runtime": 60134.8665,
     "train_samples": 383339,
+    "train_samples_per_second": 12.749,
+    "train_steps_per_second": 0.1
 }

train_results.json CHANGED Viewed

@@ -1,9 +1,9 @@
 {
     "epoch": 2.0,
-    "total_flos": 525762547179520.0,
-    "train_loss": 0.32578817207447075,
-    "train_runtime": 11617.078,
     "train_samples": 383339,
-    "train_samples_per_second": 65.996,
-    "train_steps_per_second": 0.516
 }

 {
     "epoch": 2.0,
+    "total_flos": 2071185792860160.0,
+    "train_loss": 0.2831706563921922,
+    "train_runtime": 60134.8665,
     "train_samples": 383339,
+    "train_samples_per_second": 12.749,
+    "train_steps_per_second": 0.1
 }

trainer_state.json CHANGED Viewed

The diff for this file is too large to render. See raw diff