frankmorales2020 commited on
Commit
0ae1bc0
1 Parent(s): 0f701f1

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +39 -5
README.md CHANGED
@@ -24,11 +24,6 @@ It achieves the following results on the evaluation set:
24
 
25
  ## Model description
26
 
27
- More information needed
28
-
29
- ## Intended uses & limitations
30
-
31
- More information needed
32
 
33
  ## Training and evaluation data
34
 
@@ -36,6 +31,9 @@ More information needed
36
 
37
  ## Training procedure
38
 
 
 
 
39
  ### Training hyperparameters
40
 
41
  The following hyperparameters were used during training:
@@ -51,6 +49,42 @@ The following hyperparameters were used during training:
51
  - lr_scheduler_warmup_steps: 1500
52
  - num_epochs: 0.5
53
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
54
  ### Training results
55
 
56
  | Training Loss | Epoch | Step | Validation Loss |
 
24
 
25
  ## Model description
26
 
 
 
 
 
 
27
 
28
  ## Training and evaluation data
29
 
 
31
 
32
  ## Training procedure
33
 
34
+ Fine Tuning: https://github.com/frank-morales2020/MLxDL/blob/main/FineTuning_LLM_Meta_Llama_3_8B_for_MEDAL_EVALDATA_PONEW.ipynb
35
+
36
+
37
  ### Training hyperparameters
38
 
39
  The following hyperparameters were used during training:
 
49
  - lr_scheduler_warmup_steps: 1500
50
  - num_epochs: 0.5
51
 
52
+ from transformers import TrainingArguments
53
+
54
+ args = TrainingArguments(
55
+
56
+ output_dir="/content/gdrive/MyDrive/model/POC-NEW-Meta-Llama-3-8B-MEDAL-flash-attention-2-cosine-evaldata",
57
+
58
+ num_train_epochs=0.5, # number of training epochs for POC
59
+ per_device_train_batch_size=3, #4 # batch size per device during training
60
+ gradient_accumulation_steps=8, #6 # values like 8, 12, or even 16, # number of steps before performing a backward/update pass
61
+ gradient_checkpointing=True, # use gradient checkpointing to save memory
62
+ optim="adamw_torch_fused", # use fused adamw optimizer
63
+ logging_steps=100, # log every 100 steps
64
+ learning_rate=2e-4, # learning rate, based on QLoRA paper # i used in the first model
65
+ #learning_rate=1e-5,
66
+ bf16=True, # use bfloat16 precision
67
+ tf32=True, # use tf32 precision
68
+ max_grad_norm=1.0, # max gradient norm based on QLoRA paper
69
+ warmup_ratio=0.03, # warmup ratio based on QLoRA paper = 0.03
70
+
71
+ weight_decay=0.01,
72
+ lr_scheduler_type="constant", # use constant learning rate scheduler
73
+ push_to_hub=True, # push model to hub
74
+ report_to="tensorboard", # report metrics to tensorboard
75
+ gradient_checkpointing_kwargs={"use_reentrant": True},
76
+
77
+ load_best_model_at_end=True,
78
+ logging_dir="/content/gdrive/MyDrive/model/POC-NEW-Meta-Llama-3-8B-MEDAL-flash-attention-2-cosine-evaldata/logs",
79
+
80
+ evaluation_strategy="steps",
81
+ eval_steps=100,
82
+ save_strategy="steps",
83
+ save_steps=100,
84
+ metric_for_best_model = "loss",
85
+ warmup_steps=1500,
86
+ )
87
+
88
  ### Training results
89
 
90
  | Training Loss | Epoch | Step | Validation Loss |