Adil1567 commited on
Commit
41b3211
·
verified ·
1 Parent(s): 891862e

Model save

Browse files
README.md CHANGED
@@ -17,6 +17,8 @@ should probably proofread and complete it, then remove this comment. -->
17
  # mistral-sft-lora-fsdp2
18
 
19
  This model is a fine-tuned version of [meta-llama/Llama-3.1-70B-Instruct](https://huggingface.co/meta-llama/Llama-3.1-70B-Instruct) on the None dataset.
 
 
20
 
21
  ## Model description
22
 
@@ -45,13 +47,14 @@ The following hyperparameters were used during training:
45
  - total_eval_batch_size: 4
46
  - optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
47
  - lr_scheduler_type: cosine
48
- - num_epochs: 1.0
49
 
50
  ### Training results
51
 
52
  | Training Loss | Epoch | Step | Validation Loss |
53
  |:-------------:|:-----:|:----:|:---------------:|
54
- | No log | 1.0 | 1 | 1.9796 |
 
55
 
56
 
57
  ### Framework versions
 
17
  # mistral-sft-lora-fsdp2
18
 
19
  This model is a fine-tuned version of [meta-llama/Llama-3.1-70B-Instruct](https://huggingface.co/meta-llama/Llama-3.1-70B-Instruct) on the None dataset.
20
+ It achieves the following results on the evaluation set:
21
+ - Loss: 1.8343
22
 
23
  ## Model description
24
 
 
47
  - total_eval_batch_size: 4
48
  - optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
49
  - lr_scheduler_type: cosine
50
+ - num_epochs: 2.0
51
 
52
  ### Training results
53
 
54
  | Training Loss | Epoch | Step | Validation Loss |
55
  |:-------------:|:-----:|:----:|:---------------:|
56
+ | No log | 1.0 | 1 | 1.9798 |
57
+ | No log | 2.0 | 2 | 1.8343 |
58
 
59
 
60
  ### Framework versions
adapter_config.json CHANGED
@@ -23,13 +23,13 @@
23
  "rank_pattern": {},
24
  "revision": null,
25
  "target_modules": [
26
- "k_proj",
27
  "down_proj",
28
- "o_proj",
29
- "gate_proj",
30
  "v_proj",
31
- "up_proj",
32
- "q_proj"
 
 
33
  ],
34
  "task_type": "CAUSAL_LM",
35
  "use_dora": false,
 
23
  "rank_pattern": {},
24
  "revision": null,
25
  "target_modules": [
26
+ "up_proj",
27
  "down_proj",
 
 
28
  "v_proj",
29
+ "gate_proj",
30
+ "o_proj",
31
+ "q_proj",
32
+ "k_proj"
33
  ],
34
  "task_type": "CAUSAL_LM",
35
  "use_dora": false,
adapter_model.safetensors CHANGED
@@ -1,3 +1,3 @@
1
  version https://git-lfs.github.com/spec/v1
2
- oid sha256:4bd5ae0ff2058d17dce24cc8ef059ae969f7f755deda4db1eafc05ceac6fb551
3
  size 1656902648
 
1
  version https://git-lfs.github.com/spec/v1
2
+ oid sha256:08a13c9bcf2f18a7fc083503c10c21f34393d99111b5deca6147519caae65646
3
  size 1656902648
logs/training_log.txt CHANGED
@@ -37,3 +37,22 @@
37
  2025-01-08 13:56:48,398 - INFO - Model saved to mistral-sft-lora-fsdp2/checkpoint-2/pytorch_model_fsdp_0
38
  2025-01-08 13:56:53,278 - INFO - Saving Optimizer state to mistral-sft-lora-fsdp2/checkpoint-2/optimizer_0
39
  2025-01-08 13:56:59,275 - INFO - Optimizer state saved in mistral-sft-lora-fsdp2/checkpoint-2/optimizer_0
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
37
  2025-01-08 13:56:48,398 - INFO - Model saved to mistral-sft-lora-fsdp2/checkpoint-2/pytorch_model_fsdp_0
38
  2025-01-08 13:56:53,278 - INFO - Saving Optimizer state to mistral-sft-lora-fsdp2/checkpoint-2/optimizer_0
39
  2025-01-08 13:56:59,275 - INFO - Optimizer state saved in mistral-sft-lora-fsdp2/checkpoint-2/optimizer_0
40
+ 2025-01-08 13:57:23,252 - INFO - Loss improved from 1.97976 to 1.83431
41
+ 2025-01-08 13:57:23,252 - INFO - Loss improved from 1.97976 to 1.83431
42
+ 2025-01-08 13:57:23,252 - INFO - Loss improved from 1.97976 to 1.83431
43
+ 2025-01-08 13:57:23,253 - INFO - Step 2/2 (100.0%), epoch: 2.0000, step_time: 205.04s, elapsed_time: 599.83s
44
+ 2025-01-08 13:57:23,254 - INFO - Evaluation Results:
45
+ eval_loss: 1.8343
46
+ eval_runtime: 23.7205
47
+ eval_samples_per_second: 0.3370
48
+ eval_steps_per_second: 0.0840
49
+ epoch: 2.0000
50
+ elapsed_time: 599.83s
51
+ step_time: 205.04s
52
+ 2025-01-08 13:57:23,255 - INFO - Loss improved from 1.97976 to 1.83431
53
+ 2025-01-08 13:58:21,011 - INFO - Saving model to mistral-sft-lora-fsdp2/checkpoint-2/pytorch_model_fsdp_0
54
+ 2025-01-08 13:58:24,262 - INFO - Model saved to mistral-sft-lora-fsdp2/checkpoint-2/pytorch_model_fsdp_0
55
+ 2025-01-08 13:58:29,156 - INFO - Saving Optimizer state to mistral-sft-lora-fsdp2/checkpoint-2/optimizer_0
56
+ 2025-01-08 13:58:35,600 - INFO - Optimizer state saved in mistral-sft-lora-fsdp2/checkpoint-2/optimizer_0
57
+ 2025-01-08 13:58:36,036 - INFO - Step 2/2 (100.0%), epoch: 2.0000, step_time: 72.78s, elapsed_time: 672.61s
58
+ 2025-01-08 13:58:36,037 - INFO - Training completed in 672.61 seconds
runs/Jan08_13-47-23_gpu-server/events.out.tfevents.1736344266.gpu-server.882619.0 CHANGED
@@ -1,3 +1,3 @@
1
  version https://git-lfs.github.com/spec/v1
2
- oid sha256:8dc1e98fd384467f592194e00f17315e978da77c8188ec70ec9d163ed6d145e6
3
- size 5873
 
1
  version https://git-lfs.github.com/spec/v1
2
+ oid sha256:ccf149023c1ed3b470f6fa07c6796be4504f503ed24f0f958974f0e5810ca0c4
3
+ size 6487