Update README.md
Browse files
README.md
CHANGED
@@ -2,15 +2,18 @@
|
|
2 |
library_name: transformers
|
3 |
license: apache-2.0
|
4 |
language:
|
5 |
-
|
6 |
datasets:
|
7 |
-
|
8 |
-
|
9 |
-
|
10 |
-
|
11 |
-
|
12 |
-
|
13 |
-
|
|
|
|
|
|
|
14 |
---
|
15 |
|
16 |
# Model Card for SmolLM-135M-de
|
@@ -75,4 +78,4 @@ print(tokenizer.decode(outputs[0]))
|
|
75 |
### Training Procedure
|
76 |
|
77 |
This was trained with axolotl, using full fine tuning (no LoRA etc). I used a sequence length of 2048, learning rate of 0.003 with the adamw_bnb_8bit optimizer and a cosine scheduler.
|
78 |
-
Due to an error I made in calculating the token count, I accidentally trained for nearly 2 epochs, with the learning rate not reaching its proper minimum.
|
|
|
2 |
library_name: transformers
|
3 |
license: apache-2.0
|
4 |
language:
|
5 |
+
- de
|
6 |
datasets:
|
7 |
+
- devngho/culturax-mini-nonshuffled
|
8 |
+
- maxidl/FineNews-unfiltered
|
9 |
+
- djstrong/oscar-small
|
10 |
+
- LemiSt/gutenberg_de
|
11 |
+
- almanach/HALvest
|
12 |
+
- wikimedia/wikipedia
|
13 |
+
- D4ve-R/terra-xplain-cc-de
|
14 |
+
base_model:
|
15 |
+
- HuggingFaceTB/SmolLM-135M
|
16 |
+
pipeline_tag: text-generation
|
17 |
---
|
18 |
|
19 |
# Model Card for SmolLM-135M-de
|
|
|
78 |
### Training Procedure
|
79 |
|
80 |
This was trained with axolotl, using full fine tuning (no LoRA etc). I used a sequence length of 2048, learning rate of 0.003 with the adamw_bnb_8bit optimizer and a cosine scheduler.
|
81 |
+
Due to an error I made in calculating the token count, I accidentally trained for nearly 2 epochs, with the learning rate not reaching its proper minimum.
|