jarodrigues
commited on
Commit
•
9f6f81d
1
Parent(s):
6c0e24b
Update README.md
Browse files
README.md
CHANGED
@@ -108,7 +108,7 @@ These take the various fields in the dataset and arrange them into prompts, whic
|
|
108 |
|
109 |
# Training Details
|
110 |
|
111 |
-
We applied supervised fine-tuning with a causal language modeling
|
112 |
Specifically, while the entire prompt received attention during fine-tuning, only the response tokens were subjected to back-propagation.
|
113 |
|
114 |
In terms of hyper-parameters, both models were trained with a learning rate of 2 * 10^-5, a weight decay of 0.1, a two-epoch training regime without warm-up, and to ensure the same number of tokens back-propagated per step, we employed an input sequence of 512 tokens with a batch size of 16 and 16 accumulation steps.
|
|
|
108 |
|
109 |
# Training Details
|
110 |
|
111 |
+
We applied supervised fine-tuning with a causal language modeling training objective following a zero-out technique during the fine-tuning process.
|
112 |
Specifically, while the entire prompt received attention during fine-tuning, only the response tokens were subjected to back-propagation.
|
113 |
|
114 |
In terms of hyper-parameters, both models were trained with a learning rate of 2 * 10^-5, a weight decay of 0.1, a two-epoch training regime without warm-up, and to ensure the same number of tokens back-propagated per step, we employed an input sequence of 512 tokens with a batch size of 16 and 16 accumulation steps.
|