viren-shah
commited on
Commit
•
a3c2746
1
Parent(s):
bbb4bfd
Update README.md
Browse files
README.md
CHANGED
@@ -100,7 +100,7 @@ print(tokenizer.batch_decode(outputs))
|
|
100 |
|
101 |
We trained SN-13B-8k-Instruct with [SambaNova DataScale systems](https://sambanova.ai/products/datascale/) with
|
102 |
SambaNova's in-house Reconfigurable Dataflow Unit (RDU). We started from random weights, and pretrained for 300 Billion
|
103 |
-
tokens on sequences of size 2048. We then pretrained for another
|
104 |
During this phase of training, we curated a dataset that had a large proportion of long sequence articles, with
|
105 |
30% of our articles consisting of greater than 6000 words.
|
106 |
|
|
|
100 |
|
101 |
We trained SN-13B-8k-Instruct with [SambaNova DataScale systems](https://sambanova.ai/products/datascale/) with
|
102 |
SambaNova's in-house Reconfigurable Dataflow Unit (RDU). We started from random weights, and pretrained for 300 Billion
|
103 |
+
tokens on sequences of size 2048. We then pretrained for another 250 Billion tokens on sequences of size 8192.
|
104 |
During this phase of training, we curated a dataset that had a large proportion of long sequence articles, with
|
105 |
30% of our articles consisting of greater than 6000 words.
|
106 |
|