Update README.md
Browse files
README.md
CHANGED
@@ -57,5 +57,5 @@ As such, the dataset is not 100% slop free, but this addition likely helps the m
|
|
57 |
|
58 |
Note on training:
|
59 |
|
60 |
-
The training was done using [Fine-Tuning with Very Large Dropout](https://arxiv.org/pdf/2403.00946) with a LoRA dropout of 0.5 and a constant learning rate of 4e-6. In addition, the model seemed to retain more of Nemotron's smartness by halving the alpha, which is how this merge (and the LoRA adapter configuration) is set up. (The LoRA was trained with alpha=64, and merged with alpha set to 32.)
|
61 |
|
|
|
57 |
|
58 |
Note on training:
|
59 |
|
60 |
+
The training was done using [Fine-Tuning with Very Large Dropout](https://arxiv.org/pdf/2403.00946) (h/t https://huggingface.co/Envoid/Llama-3.05-NT-Storybreaker-Ministral-70B for the idea) with a LoRA dropout of 0.5 and a constant learning rate of 4e-6. In addition, the model seemed to retain more of Nemotron's smartness by halving the alpha, which is how this merge (and the LoRA adapter configuration) is set up. (The LoRA was trained with alpha=64, and merged with alpha set to 32.)
|
61 |
|