|
--- |
|
library_name: transformers |
|
license: apache-2.0 |
|
language: |
|
- en |
|
datasets: |
|
- howey/unarXive |
|
- howey/wiki_en |
|
- howey/hupd |
|
--- |
|
# Model Weights Comming Soon! |
|
## Using HDT |
|
To use the pre-trained model for masked language modeling, use the following snippet: |
|
```python |
|
from transformers import AutoModelForMaskedLM, AutoTokenizer |
|
|
|
# See the `MDLM` collection page on the hub for list of available models. |
|
tokenizer = transformers.AutoTokenizer.from_pretrained('howey/HDT-E') |
|
model_name = 'howey/HDT-E' |
|
model = AutoModelForMaskedLM.from_pretrained(model_name) |
|
``` |
|
|
|
For more details, please see our github repository: [HDT](https://github.com/autonomousvision/hdt) |
|
|
|
## Model Details |
|
The model, which has a context length of `8192` and is similar in size to BERT with approximately `110M` parameters, |
|
was trained on standard masked language modeling task with a Transformer-based architecture using our proposed hierarchical attention. |
|
The training regimen comprised 24 hours on the ArXiv+Wikipedia+HUPD corpus, involving the processing of a total of `1.3 billion` tokens. |
|
|
|
For more details, please see our paper: [HDT: Hierarchical Document Transformer](https://arxiv.org/pdf/2407.08330). |
|
|
|
|
|
|
|
## Citation |
|
|
|
<!-- If there is a paper or blog post introducing the model, the Bibtex information for that should go in this section. --> |
|
Please cite our work using the bibtex below: |
|
|
|
**BibTeX:** |
|
|
|
``` |
|
@inproceedings{He2024COLM, |
|
title={HDT: Hierarchical Document Transformer}, |
|
author={Haoyu He and Markus Flicke and Jan Buchmann and Iryna Gurevych and Andreas Geiger}, |
|
year={2024}, |
|
booktitle={Conference on Language Modeling} |
|
} |
|
``` |
|
|
|
## Model Card Contact |
|
Haoyu (haoyu.he@uni-tuebingen.de) |