LemiSt
/

SmolLM-135M-de

Text Generation

feature-extraction

text-generation-inference

Inference Endpoints

Model card Files Files and versions Community

LemiSt commited on Oct 7, 2024

Commit

73f924f

·

verified ·

1 Parent(s): 5a2ceaa

Update README.md

Files changed (1) hide show

README.md +12 -9

README.md CHANGED Viewed

@@ -2,15 +2,18 @@
 library_name: transformers
 license: apache-2.0
 language:
-  - de
 datasets:
-  - devngho/culturax-mini-nonshuffled
-  - maxidl/FineNews-unfiltered
-  - djstrong/oscar-small
-  - LemiSt/gutenberg_de
-  - almanach/HALvest
-  - wikimedia/wikipedia
-  - D4ve-R/terra-xplain-cc-de
 ---
 # Model Card for SmolLM-135M-de
@@ -75,4 +78,4 @@ print(tokenizer.decode(outputs[0]))
 ### Training Procedure
 This was trained with axolotl, using full fine tuning (no LoRA etc). I used a sequence length of 2048, learning rate of 0.003 with the adamw_bnb_8bit optimizer and a cosine scheduler.
-Due to an error I made in calculating the token count, I accidentally trained for nearly 2 epochs, with the learning rate not reaching its proper minimum.

 library_name: transformers
 license: apache-2.0
 language:
+- de
 datasets:
+- devngho/culturax-mini-nonshuffled
+- maxidl/FineNews-unfiltered
+- djstrong/oscar-small
+- LemiSt/gutenberg_de
+- almanach/HALvest
+- wikimedia/wikipedia
+- D4ve-R/terra-xplain-cc-de
+base_model:
+- HuggingFaceTB/SmolLM-135M
+pipeline_tag: text-generation
 ---
 # Model Card for SmolLM-135M-de
 ### Training Procedure
 This was trained with axolotl, using full fine tuning (no LoRA etc). I used a sequence length of 2048, learning rate of 0.003 with the adamw_bnb_8bit optimizer and a cosine scheduler.
+Due to an error I made in calculating the token count, I accidentally trained for nearly 2 epochs, with the learning rate not reaching its proper minimum.