LemiSt commited on
Commit
73f924f
·
verified ·
1 Parent(s): 5a2ceaa

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +12 -9
README.md CHANGED
@@ -2,15 +2,18 @@
2
  library_name: transformers
3
  license: apache-2.0
4
  language:
5
- - de
6
  datasets:
7
- - devngho/culturax-mini-nonshuffled
8
- - maxidl/FineNews-unfiltered
9
- - djstrong/oscar-small
10
- - LemiSt/gutenberg_de
11
- - almanach/HALvest
12
- - wikimedia/wikipedia
13
- - D4ve-R/terra-xplain-cc-de
 
 
 
14
  ---
15
 
16
  # Model Card for SmolLM-135M-de
@@ -75,4 +78,4 @@ print(tokenizer.decode(outputs[0]))
75
  ### Training Procedure
76
 
77
  This was trained with axolotl, using full fine tuning (no LoRA etc). I used a sequence length of 2048, learning rate of 0.003 with the adamw_bnb_8bit optimizer and a cosine scheduler.
78
- Due to an error I made in calculating the token count, I accidentally trained for nearly 2 epochs, with the learning rate not reaching its proper minimum.
 
2
  library_name: transformers
3
  license: apache-2.0
4
  language:
5
+ - de
6
  datasets:
7
+ - devngho/culturax-mini-nonshuffled
8
+ - maxidl/FineNews-unfiltered
9
+ - djstrong/oscar-small
10
+ - LemiSt/gutenberg_de
11
+ - almanach/HALvest
12
+ - wikimedia/wikipedia
13
+ - D4ve-R/terra-xplain-cc-de
14
+ base_model:
15
+ - HuggingFaceTB/SmolLM-135M
16
+ pipeline_tag: text-generation
17
  ---
18
 
19
  # Model Card for SmolLM-135M-de
 
78
  ### Training Procedure
79
 
80
  This was trained with axolotl, using full fine tuning (no LoRA etc). I used a sequence length of 2048, learning rate of 0.003 with the adamw_bnb_8bit optimizer and a cosine scheduler.
81
+ Due to an error I made in calculating the token count, I accidentally trained for nearly 2 epochs, with the learning rate not reaching its proper minimum.