howey
/

HDT-E

howey commited on Jul 14, 2024

Commit

8822727

verified ·

1 Parent(s): 9eedb04

Update README.md

Files changed (1) hide show

README.md CHANGED Viewed

@@ -25,7 +25,7 @@ For more details, please see our github repository: [HDT](https://github.com/aut
 ## Model Details
 The model, which has a context length of `8192` and is similar in size to BERT with approximately `110M` parameters,
 was trained on standard masked language modeling task with a Transformer-based architecture using our proposed hierarchical attention.
-The training regimen comprised 24 hours on the ArXiv+Wikipedia+HUPD corpus, involving the processing of a total of `160 million` tokens.
 For more details, please see our paper: [HDT: Hierarchical Document Transformer](https://arxiv.org/pdf/2407.08330).

 ## Model Details
 The model, which has a context length of `8192` and is similar in size to BERT with approximately `110M` parameters,
 was trained on standard masked language modeling task with a Transformer-based architecture using our proposed hierarchical attention.
+The training regimen comprised 24 hours on the ArXiv+Wikipedia+HUPD corpus, involving the processing of a total of `1.3 billion` tokens.
 For more details, please see our paper: [HDT: Hierarchical Document Transformer](https://arxiv.org/pdf/2407.08330).