LemiSt
/

SmolLM-135M-de

Text Generation

feature-extraction

text-generation-inference

Inference Endpoints

Model card Files Files and versions Community

LemiSt commited on Oct 7, 2024

Commit

6e50ffc

·

verified ·

1 Parent(s): 865c932

Update README.md

Files changed (1) hide show

README.md +1 -1

README.md CHANGED Viewed

@@ -77,5 +77,5 @@ print(tokenizer.decode(outputs[0]))
 ### Training Procedure
-This was trained with axolotl, using full fine tuning (no LoRA etc). I used a sequence length of 2048, learning rate of 0.003 with the adamw_bnb_8bit optimizer and a cosine scheduler.
 Due to an error I made in calculating the token count, I accidentally trained for nearly 2 epochs, with the learning rate not reaching its proper minimum.

 ### Training Procedure
+This was trained with axolotl, using full fine tuning (no LoRA etc). I used a sequence length of 2048 with an effective batch size of 512, learning rate of 0.003 with the adamw_bnb_8bit optimizer and a cosine scheduler.
 Due to an error I made in calculating the token count, I accidentally trained for nearly 2 epochs, with the learning rate not reaching its proper minimum.