eryk-mazus
/

polka-1.1b

Text Generation

text-generation-inference

Inference Endpoints

Model card Files Files and versions Community

eryk-mazus commited on Jan 6

Commit

2b84ea6

•

1 Parent(s): bd8ead3

Update README.md

Files changed (1) hide show

README.md +2 -0

README.md CHANGED Viewed

@@ -23,6 +23,8 @@ widget:
 The training took 425 GPU hours on a single 8 x RTX 4090 machine with DeepSpeed ZeRO-2.
 ## Notes
 This base model was initially developed as a foundation for **instruction tuning, which is currently underway**. Nevertheless, I'm sharing it with the community now, because I recognize the potential value in its blend of relatively strong performance and an efficient bilingual tokenizer.

 The training took 425 GPU hours on a single 8 x RTX 4090 machine with DeepSpeed ZeRO-2.
+Context size: 2,048 tokens.
 ## Notes
 This base model was initially developed as a foundation for **instruction tuning, which is currently underway**. Nevertheless, I'm sharing it with the community now, because I recognize the potential value in its blend of relatively strong performance and an efficient bilingual tokenizer.