File size: 3,355 Bytes
3b7dcf9
e7482f5
bb5c32d
 
0d8688e
bb5c32d
ea4dc1a
bb5c32d
 
 
194ba35
bb5c32d
3b7dcf9
bb5c32d
194ba35
bb5c32d
 
 
0d8688e
bb5c32d
0d8688e
bb5c32d
0d8688e
bb5c32d
 
 
 
 
0d8688e
 
bb5c32d
0d8688e
bb5c32d
 
 
 
 
 
3f89dbc
194ba35
3f89dbc
bb5c32d
 
194ba35
bb5c32d
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
0d8688e
ea4dc1a
0d8688e
 
 
bb5c32d
 
 
0d8688e
bb5c32d
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
---
license: cc-by-sa-4.0
pipeline_tag: fill-mask
language: en
arxiv: 2210.05529
tags:
- long-documents
datasets:
- c4
model-index:
- name: kiddothe2b/adhoc-hierarchical-transformer-base-4096
  results: []
---

# Hierarchical Attention Transformer (HAT) / kiddothe2b/adhoc-hierarchical-transformer-base-4096

## Model description

This is a Hierarchical Attention Transformer (HAT) model as presented in [An Exploration of Hierarchical Attention Transformers for Efficient Long Document Classification (Chalkidis et al., 2022)](https://arxiv.org/abs/2210.05529). 

The model has been warm-started re-using the weights of RoBERTa (Liu et al., 2019), BUT has not been continued pre-trained. It supports sequences of length up to 4,096.

HAT uses hierarchical attention, which is a combination of segment-wise and cross-segment attention operations. You can think of segments as paragraphs or sentences.

Note: If you wish to use a fully pre-trained HAT model, you have to use [kiddothe2b/adhoc-hat-base-4096](https://huggingface.co/kiddothe2b/adhoc-hat-base-4096).

## Intended uses & limitations

The model is intended to be fine-tuned on a downstream task.
See the [model hub](https://huggingface.co/models?filter=hierarchical-transformer) to look for other versions of HAT, or fine-tuned versions on a task that interests you.

Note that this model is primarily aimed at being fine-tuned on tasks that use the whole document to make decisions, such as document classification, sequential sentence classification, or question answering.

## How to use

You can fine-tune it for SequenceClassification, SequentialSentenceClassification, and MultipleChoice down-stream tasks:

```python
from transformers import AutoTokenizer, AutoModelForSequenceClassification
tokenizer = AutoTokenizer.from_pretrained("kiddothe2b/adhoc-hierarchical-transformer-base-4096", trust_remote_code=True)
doc_classifier = AutoModelForSequenceClassification("kiddothe2b/adhoc-hierarchical-transformer-base-4096", trust_remote_code=True)
```

Note: If you wish to use a fully pre-trained HAT model, you have to use [kiddothe2b/hierarchical-transformer-base-4096](https://huggingface.co/kiddothe2b/hierarchical-transformer-base-4096).


## Limitations and bias

The training data used for this model contains a lot of unfiltered content from the internet, which is far from
neutral. Therefore, the model can have biased predictions.


## Training procedure

### Training and evaluation data

The model has been warm-started from [roberta-base](https://huggingface.co/roberta-base) checkpoint.

### Framework versions

- Transformers 4.19.0.dev0
- Pytorch 1.11.0+cu102
- Datasets 2.0.0
- Tokenizers 0.11.6


## Citing

If you use HAT in your research, please cite:

[An Exploration of Hierarchical Attention Transformers for Efficient Long Document Classification](https://arxiv.org/abs/2210.05529). Ilias Chalkidis, Xiang Dai, Manos Fergadiotis, Prodromos Malakasiotis, and Desmond Elliott. 2022. arXiv:2210.05529 (Preprint).

```
@misc{chalkidis-etal-2022-hat,
  url = {https://arxiv.org/abs/2210.05529},
  author = {Chalkidis, Ilias and Dai, Xiang and Fergadiotis, Manos and Malakasiotis, Prodromos and Elliott, Desmond},
  title = {An Exploration of Hierarchical Attention Transformers for Efficient Long Document Classification},
  publisher = {arXiv},
  year = {2022},
}
```