viren-shah commited on
Commit
a3c2746
1 Parent(s): bbb4bfd

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +1 -1
README.md CHANGED
@@ -100,7 +100,7 @@ print(tokenizer.batch_decode(outputs))
100
 
101
  We trained SN-13B-8k-Instruct with [SambaNova DataScale systems](https://sambanova.ai/products/datascale/) with
102
  SambaNova's in-house Reconfigurable Dataflow Unit (RDU). We started from random weights, and pretrained for 300 Billion
103
- tokens on sequences of size 2048. We then pretrained for another 500 Billion tokens on sequences of size 8192.
104
  During this phase of training, we curated a dataset that had a large proportion of long sequence articles, with
105
  30% of our articles consisting of greater than 6000 words.
106
 
 
100
 
101
  We trained SN-13B-8k-Instruct with [SambaNova DataScale systems](https://sambanova.ai/products/datascale/) with
102
  SambaNova's in-house Reconfigurable Dataflow Unit (RDU). We started from random weights, and pretrained for 300 Billion
103
+ tokens on sequences of size 2048. We then pretrained for another 250 Billion tokens on sequences of size 8192.
104
  During this phase of training, we curated a dataset that had a large proportion of long sequence articles, with
105
  30% of our articles consisting of greater than 6000 words.
106