kiddothe2b commited on
Commit
bb5c32d
1 Parent(s): 3b7dcf9

Initial commit

Browse files
README.md CHANGED
@@ -1,3 +1,80 @@
1
  ---
2
  license: cc-by-nc-sa-4.0
 
 
 
 
 
 
 
 
 
3
  ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
  ---
2
  license: cc-by-nc-sa-4.0
3
+ pipeline_tag: fill-mask
4
+ language: en
5
+ tags:
6
+ - long_documents
7
+ datasets:
8
+ - c4
9
+ model-index:
10
+ - name: kiddothe2b/adhoc-hat-base-4096
11
+ results: []
12
  ---
13
+
14
+ # Hierarchical Attention Transformer (HAT) / adhoc-hat-base-4096
15
+
16
+ ## Model description
17
+
18
+ This is a Hierarchical Attention Transformer (HAT) model as presented in [An Exploration of Hierarchical Attention Transformers for Efficient Long Document Classification (Chalkidis et al., 2022)](https://arxiv.org/abs/xxx).
19
+
20
+ The model has not been warm-started re-using the weights of RoBERTa (Liu et al., 2019), BUT has not been continued pre-trained. It supports sequences of length up to 4,096.
21
+
22
+ HAT use a hierarchical attention, which is a combination of segment-wise and cross-segment attention operations. You can think segments as paragraphs or sentences.
23
+
24
+ Note: If you wish to use a fully pre-trained HAT model, you have to use [kiddothe2b/adhoc-hat-base-4096](https://huggingface.co/kiddothe2b/adhoc-hat-base-4096).
25
+
26
+ ## Intended uses & limitations
27
+
28
+ You can use the raw model for masked language modeling, but it's mostly intended to be fine-tuned on a downstream task.
29
+ See the [model hub](https://huggingface.co/models?filter=hat) to look for fine-tuned versions on a task that
30
+ interests you.
31
+
32
+ Note that this model is primarily aimed at being fine-tuned on tasks that use the whole document to make decisions, such as document classification, sequential sentence classification or question answering.
33
+
34
+ ## How to use
35
+
36
+ You can fine-tune it for SequenceClassification, SequentialSentenceClassification, and MultipleChoice down-stream tasks:
37
+
38
+ ```python
39
+ from transformers import AutoTokenizer, AutoModelforSequenceClassification
40
+ tokenizer = AutoTokenizer.from_pretrained("kiddothe2b/adhoc-hat-base-4096", trust_remote_code=True)
41
+ doc_classifier = AutoModelforSequenceClassification(model='kiddothe2b/adhoc-hat-base-4096', trust_remote_code=True)
42
+ ```
43
+
44
+ Note: If you wish to use a fully pre-trained HAT model, you have to use [kiddothe2b/adhoc-hat-base-4096](https://huggingface.co/kiddothe2b/adhoc-hat-base-4096).
45
+
46
+
47
+ ## Limitations and bias
48
+
49
+ The training data used for this model contains a lot of unfiltered content from the internet, which is far from
50
+ neutral. Therefore, the model can have biased predictions.
51
+
52
+
53
+ ## Training procedure
54
+
55
+ ### Training and evaluation data
56
+
57
+ The model has been warm-started from [roberta-base](https://huggingface.co/roberta-base) checkpoint.
58
+
59
+ ### Framework versions
60
+
61
+ - Transformers 4.19.0.dev0
62
+ - Pytorch 1.11.0+cu102
63
+ - Datasets 2.0.0
64
+ - Tokenizers 0.11.6
65
+
66
+
67
+ ##Citing
68
+ If you use HAT in your research, please cite [An Exploration of Hierarchical Attention Transformers for Efficient Long Document Classification](https://arxiv.org/abs/xxx)
69
+
70
+ ```
71
+ @misc{chalkidis-etal-2022-hat,
72
+ url = {https://arxiv.org/abs/xxx},
73
+ author = {Chalkidis, Ilias and Dai, Xiang and Fergadiotis, Manos and Malakasiotis, Prodromos and Elliott, Desmond},
74
+ title = {An Exploration of Hierarchical Attention Transformers for Efficient Long Document Classification},
75
+ publisher = {arXiv},
76
+ year = {2022},
77
+ }
78
+ ```
79
+
80
+
config.json ADDED
@@ -0,0 +1,94 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "architectures": [
3
+ "HiTransformerForMaskedLM"
4
+ ],
5
+ "attention_probs_dropout_prob": 0.1,
6
+ "bos_token_id": 0,
7
+ "classifier_dropout": null,
8
+ "encoder_layout": {
9
+ "0": {
10
+ "document_encoder": false,
11
+ "sentence_encoder": true
12
+ },
13
+ "1": {
14
+ "document_encoder": false,
15
+ "sentence_encoder": true
16
+ },
17
+ "10": {
18
+ "document_encoder": false,
19
+ "sentence_encoder": true
20
+ },
21
+ "11": {
22
+ "document_encoder": true,
23
+ "sentence_encoder": true
24
+ },
25
+ "12": {
26
+ "document_encoder": true,
27
+ "sentence_encoder": false
28
+ },
29
+ "13": {
30
+ "document_encoder": true,
31
+ "sentence_encoder": false
32
+ },
33
+ "14": {
34
+ "document_encoder": true,
35
+ "sentence_encoder": false
36
+ },
37
+ "2": {
38
+ "document_encoder": false,
39
+ "sentence_encoder": true
40
+ },
41
+ "3": {
42
+ "document_encoder": false,
43
+ "sentence_encoder": true
44
+ },
45
+ "4": {
46
+ "document_encoder": false,
47
+ "sentence_encoder": true
48
+ },
49
+ "5": {
50
+ "document_encoder": false,
51
+ "sentence_encoder": true
52
+ },
53
+ "6": {
54
+ "document_encoder": false,
55
+ "sentence_encoder": true
56
+ },
57
+ "7": {
58
+ "document_encoder": false,
59
+ "sentence_encoder": true
60
+ },
61
+ "8": {
62
+ "document_encoder": false,
63
+ "sentence_encoder": true
64
+ },
65
+ "9": {
66
+ "document_encoder": false,
67
+ "sentence_encoder": true
68
+ }
69
+ },
70
+ "eos_token_id": 2,
71
+ "hidden_act": "gelu",
72
+ "hidden_dropout_prob": 0.1,
73
+ "hidden_size": 768,
74
+ "initializer_range": 0.02,
75
+ "intermediate_size": 3072,
76
+ "layer_norm_eps": 1e-12,
77
+ "max_position_embeddings": 130,
78
+ "max_sentence_length": 128,
79
+ "max_sentence_size": 128,
80
+ "max_sentences": 32,
81
+ "model_max_length": 4096,
82
+ "model_type": "hi-transformer",
83
+ "num_attention_heads": 12,
84
+ "num_hidden_layers": 15,
85
+ "output_past": true,
86
+ "pad_token_id": 1,
87
+ "parameters": 136350720,
88
+ "position_embedding_type": "absolute",
89
+ "torch_dtype": "float32",
90
+ "transformers_version": "4.18.0",
91
+ "type_vocab_size": 1,
92
+ "use_cache": true,
93
+ "vocab_size": 50265
94
+ }
merges.txt ADDED
The diff for this file is too large to render. See raw diff
 
pytorch_model.bin ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:91631c0072627aee02435f31d0f6f73d2bd259cf6af1f58f1c658c617e4a9759
3
+ size 612321259
special_tokens_map.json ADDED
@@ -0,0 +1 @@
 
 
1
+ {"bos_token": "<s>", "eos_token": "</s>", "unk_token": "<unk>", "sep_token": "</s>", "pad_token": "<pad>", "cls_token": "<s>", "mask_token": {"content": "<mask>", "single_word": false, "lstrip": true, "rstrip": false, "normalized": false}}
tokenizer.json ADDED
The diff for this file is too large to render. See raw diff
 
tokenizer_config.json ADDED
@@ -0,0 +1 @@
 
 
1
+ {"errors": "replace", "bos_token": "<s>", "eos_token": "</s>", "sep_token": "</s>", "cls_token": "<s>", "unk_token": "<unk>", "pad_token": "<pad>", "mask_token": "<mask>", "add_prefix_space": false, "trim_offsets": true, "model_max_length": 512, "special_tokens_map_file": null, "name_or_path": "roberta-base", "tokenizer_class": "RobertaTokenizer"}
vocab.json ADDED
The diff for this file is too large to render. See raw diff