File size: 1,697 Bytes
300e95c 9eedb04 300e95c 97d11af 300e95c 8822727 300e95c |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 |
---
library_name: transformers
license: apache-2.0
language:
- en
datasets:
- howey/unarXive
- howey/wiki_en
- howey/hupd
---
# Model Weights Comming Soon!
## Using HDT
To use the pre-trained model for masked language modeling, use the following snippet:
```python
from transformers import AutoModelForMaskedLM, AutoTokenizer
# See the `MDLM` collection page on the hub for list of available models.
tokenizer = transformers.AutoTokenizer.from_pretrained('howey/HDT-E')
model_name = 'howey/HDT-E'
model = AutoModelForMaskedLM.from_pretrained(model_name)
```
For more details, please see our github repository: [HDT](https://github.com/autonomousvision/hdt)
## Model Details
The model, which has a context length of `8192` and is similar in size to BERT with approximately `110M` parameters,
was trained on standard masked language modeling task with a Transformer-based architecture using our proposed hierarchical attention.
The training regimen comprised 24 hours on the ArXiv+Wikipedia+HUPD corpus, involving the processing of a total of `1.3 billion` tokens.
For more details, please see our paper: [HDT: Hierarchical Document Transformer](https://arxiv.org/pdf/2407.08330).
## Citation
<!-- If there is a paper or blog post introducing the model, the Bibtex information for that should go in this section. -->
Please cite our work using the bibtex below:
**BibTeX:**
```
@inproceedings{He2024COLM,
title={HDT: Hierarchical Document Transformer},
author={Haoyu He and Markus Flicke and Jan Buchmann and Iryna Gurevych and Andreas Geiger},
year={2024},
booktitle={Conference on Language Modeling}
}
```
## Model Card Contact
Haoyu (haoyu.he@uni-tuebingen.de) |