Transformers
English
Inference Endpoints
howey commited on
Commit
8822727
·
verified ·
1 Parent(s): 9eedb04

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +1 -1
README.md CHANGED
@@ -25,7 +25,7 @@ For more details, please see our github repository: [HDT](https://github.com/aut
25
  ## Model Details
26
  The model, which has a context length of `8192` and is similar in size to BERT with approximately `110M` parameters,
27
  was trained on standard masked language modeling task with a Transformer-based architecture using our proposed hierarchical attention.
28
- The training regimen comprised 24 hours on the ArXiv+Wikipedia+HUPD corpus, involving the processing of a total of `160 million` tokens.
29
 
30
  For more details, please see our paper: [HDT: Hierarchical Document Transformer](https://arxiv.org/pdf/2407.08330).
31
 
 
25
  ## Model Details
26
  The model, which has a context length of `8192` and is similar in size to BERT with approximately `110M` parameters,
27
  was trained on standard masked language modeling task with a Transformer-based architecture using our proposed hierarchical attention.
28
+ The training regimen comprised 24 hours on the ArXiv+Wikipedia+HUPD corpus, involving the processing of a total of `1.3 billion` tokens.
29
 
30
  For more details, please see our paper: [HDT: Hierarchical Document Transformer](https://arxiv.org/pdf/2407.08330).
31