Update README.md
Browse files
README.md
CHANGED
@@ -25,7 +25,7 @@ For more details, please see our github repository: [HDT](https://github.com/aut
|
|
25 |
## Model Details
|
26 |
The model, which has a context length of `8192` and is similar in size to BERT with approximately `110M` parameters,
|
27 |
was trained on standard masked language modeling task with a Transformer-based architecture using our proposed hierarchical attention.
|
28 |
-
The training regimen comprised 24 hours on the ArXiv+Wikipedia+HUPD corpus, involving the processing of a total of `
|
29 |
|
30 |
For more details, please see our paper: [HDT: Hierarchical Document Transformer](https://arxiv.org/pdf/2407.08330).
|
31 |
|
|
|
25 |
## Model Details
|
26 |
The model, which has a context length of `8192` and is similar in size to BERT with approximately `110M` parameters,
|
27 |
was trained on standard masked language modeling task with a Transformer-based architecture using our proposed hierarchical attention.
|
28 |
+
The training regimen comprised 24 hours on the ArXiv+Wikipedia+HUPD corpus, involving the processing of a total of `1.3 billion` tokens.
|
29 |
|
30 |
For more details, please see our paper: [HDT: Hierarchical Document Transformer](https://arxiv.org/pdf/2407.08330).
|
31 |
|