sambanovasystems
/

SN-13B-8k-Instruct

Feature Extraction

text-generation-inference

Model card Files Files and versions Community

viren-shah commited on Aug 31, 2023

Commit

a3c2746

•

1 Parent(s): bbb4bfd

Update README.md

Files changed (1) hide show

README.md +1 -1

README.md CHANGED Viewed

@@ -100,7 +100,7 @@ print(tokenizer.batch_decode(outputs))
 We trained SN-13B-8k-Instruct with [SambaNova DataScale systems](https://sambanova.ai/products/datascale/) with
 SambaNova's in-house Reconfigurable Dataflow Unit (RDU). We started from random weights, and pretrained for 300 Billion
-tokens on sequences of size 2048.  We then pretrained for another 500 Billion tokens on sequences of size 8192.
 During this phase of training, we curated a dataset that had a large proportion of long sequence articles, with
 30% of our articles consisting of greater than 6000 words.

 We trained SN-13B-8k-Instruct with [SambaNova DataScale systems](https://sambanova.ai/products/datascale/) with
 SambaNova's in-house Reconfigurable Dataflow Unit (RDU). We started from random weights, and pretrained for 300 Billion
+tokens on sequences of size 2048.  We then pretrained for another 250 Billion tokens on sequences of size 8192.
 During this phase of training, we curated a dataset that had a large proportion of long sequence articles, with
 30% of our articles consisting of greater than 6000 words.