PORTULAN
/

gervasio-7b-portuguese-ptbr-decoder

Model card Files Files and versions Community

jarodrigues commited on Mar 1, 2024

Commit

9f6f81d

•

1 Parent(s): 6c0e24b

Update README.md

Files changed (1) hide show

README.md +1 -1

README.md CHANGED Viewed

@@ -108,7 +108,7 @@ These take the various fields in the dataset and arrange them into prompts, whic
 # Training Details
-We applied supervised fine-tuning with a causal language modeling (CLM) training objective following a zero-out technique during the fine-tuning process.
 Specifically, while the entire prompt received attention during fine-tuning, only the response tokens were subjected to back-propagation.
 In terms of hyper-parameters, both models were trained with a learning rate of 2 * 10^-5, a weight decay of 0.1, a two-epoch training regime without warm-up, and to ensure the same number of tokens back-propagated per step, we employed an input sequence of 512 tokens with a batch size of 16 and 16 accumulation steps.

 # Training Details
+We applied supervised fine-tuning with a causal language modeling training objective following a zero-out technique during the fine-tuning process.
 Specifically, while the entire prompt received attention during fine-tuning, only the response tokens were subjected to back-propagation.
 In terms of hyper-parameters, both models were trained with a learning rate of 2 * 10^-5, a weight decay of 0.1, a two-epoch training regime without warm-up, and to ensure the same number of tokens back-propagated per step, we employed an input sequence of 512 tokens with a batch size of 16 and 16 accumulation steps.