Update README.md
#3
by
Joseph717171
- opened
README.md
CHANGED
@@ -20,7 +20,7 @@ SmolLM is a series of state-of-the-art small language models available in three
|
|
20 |
To build SmolLM-Instruct, we instruction tuned the models using publicly available permissive instruction datasets. We trained all three models for one epoch on the permissive subset of the WebInstructSub dataset, combined with StarCoder2-Self-OSS-Instruct. Following this, we performed DPO (Direct Preference Optimization) for one epoch: using HelpSteer for the 135M and 1.7B models, and argilla/dpo-mix-7k for the 360M model. We followed the training parameters from the Zephyr-Gemma recipe in the alignment handbook, but adjusted the SFT (Supervised Fine-Tuning) learning rate to 3e-4.
|
21 |
[Apache 2.0](https://www.apache.org/licenses/LICENSE-2.0)
|
22 |
|
23 |
-
This is the SmolLM-
|
24 |
|
25 |
### Generation
|
26 |
```bash
|
|
|
20 |
To build SmolLM-Instruct, we instruction tuned the models using publicly available permissive instruction datasets. We trained all three models for one epoch on the permissive subset of the WebInstructSub dataset, combined with StarCoder2-Self-OSS-Instruct. Following this, we performed DPO (Direct Preference Optimization) for one epoch: using HelpSteer for the 135M and 1.7B models, and argilla/dpo-mix-7k for the 360M model. We followed the training parameters from the Zephyr-Gemma recipe in the alignment handbook, but adjusted the SFT (Supervised Fine-Tuning) learning rate to 3e-4.
|
21 |
[Apache 2.0](https://www.apache.org/licenses/LICENSE-2.0)
|
22 |
|
23 |
+
This is the SmolLM-1.7B-Instruct.
|
24 |
|
25 |
### Generation
|
26 |
```bash
|