File size: 4,562 Bytes
f42842d
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
82ef605
24140c4
 
 
5d59cce
 
 
faed870
 
 
 
 
 
 
 
aa2bd8d
 
 
 
 
 
 
 
faed870
 
f42842d
 
dc92cb5
f42842d
 
d910415
 
 
 
 
 
e810c3f
d910415
 
 
b9e82be
263ae9e
d910415
 
b9e82be
 
d910415
 
 
 
 
f42842d
 
379f42d
 
f42842d
 
33368d3
f42842d
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
d910415
 
f42842d
 
 
 
 
 
 
d910415
f42842d
 
 
 
d910415
f42842d
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
---
language:
- en
tags:
- summarization
datasets:
- scientific_papers
metrics:
- rouge
model-index:
- name: ccdv/lsg-bart-base-4096-arxiv
  results: []
---

<!-- This model card has been generated automatically according to the information the Trainer had access to. You
should probably proofread and complete it, then remove this comment. -->

**Transformers >= 4.36.1**\
**This model relies on a custom modeling file, you need to add trust_remote_code=True**\
**See [\#13467](https://github.com/huggingface/transformers/pull/13467)**

LSG ArXiv [paper](https://arxiv.org/abs/2210.15497). \
Github/conversion script is available at this [link](https://github.com/ccdv-ai/convert_checkpoint_to_lsg).

```python
from transformers import AutoTokenizer, AutoModelForSeq2SeqLM, pipeline

tokenizer = AutoTokenizer.from_pretrained("ccdv/lsg-bart-base-4096-arxiv", trust_remote_code=True)
model = AutoModelForSeq2SeqLM.from_pretrained("ccdv/lsg-bart-base-4096-arxiv", trust_remote_code=True)

text = "Replace by what you want."
pipe = pipeline("text2text-generation", model=model, tokenizer=tokenizer, device=0)
generated_text = pipe(
  text, 
  truncation=True, 
  max_length=64, 
  no_repeat_ngram_size=7,
  num_beams=2,
  early_stopping=True
  )
```

# ccdv/lsg-bart-base-4096-arxiv

This model is a fine-tuned version of [ccdv/lsg-bart-base-4096](https://huggingface.co/ccdv/lsg-bart-base-4096) on the [scientific_papers arxiv](https://huggingface.co/datasets/scientific_papers) dataset. \
It achieves the following results on the test set:

| Length | Sparse Type  | Block Size | Sparsity | Connexions | R1    | R2    | RL    | RLsum |
|:------ |:------------ |:---------- |:-------- | :--------- |:----- |:----- |:----- |:----- |
| 4096   | Local        | 256        | 0        | 768        | 46.65 | 18.91 | 26.90 | 42.18 |
| 4096   | Local        | 128        | 0        | 384        | 46.18 | 18.57 | 26.71 | 41.69 |
| 4096   | Pooling      | 128        | 4        | 644        | 46.27 | 18.68 | 26.87 | 41.82 |
| 4096   | Stride       | 128        | 4        | 644        | 46.34 | 18.64 | 26.69 | 41.87 |
| 4096   | Block Stride | 128        | 4        | 644        | 46.23 | 18.62 | 26.62 | 41.80 |
| 4096   | Norm         | 128        | 4        | 644        | 45.96 | 18.46 | 26.52 | 41.51 |
| 4096   | LSH          | 128        | 4        | 644        | 46.19 | 18.72 | 26.89 | 41.76 |

With smaller block size (lower ressources):

| Length | Sparse Type  | Block Size | Sparsity | Connexions | R1    | R2    | RL    | RLsum |
|:------ |:------------ |:---------- |:-------- | :--------- |:----- |:----- |:----- |:----- |
| 4096   | Local        | 64         | 0        | 192        | 44.71 | 17.53 | 26.03 | 40.23 |
| 4096   | Local        | 32         | 0        | 96         | 39.67 | 14.34 | 23.81 | 35.00 |
| 4096   | Pooling      | 32         | 4        | 160        | 42.75 | 16.34 | 25.20 | 38.23 |
| 4096   | Stride       | 32         | 4        | 160        | 44.23 | 17.21 | 25.71 | 39.72 |
| 4096   | Block Stride | 32         | 4        | 160        | 44.15 | 17.10 | 25.68 | 39.60 |
| 4096   | Norm         | 32         | 4        | 160        | 42.02 | 15.65 | 24.56 | 37.45 |
| 4096   | LSH          | 32         | 4        | 160        | 42.58 | 16.21 | 25.10 | 38.04 |

## Model description
The model relies on Local-Sparse-Global attention to handle long sequences:
![attn](attn.png)

The model has about ~145 millions parameters (6 encoder layers - 6 decoder layers). \
The model is warm started from BART-base, converted to handle long sequences (encoder only) and fine tuned.

## Intended uses & limitations

More information needed

## Training and evaluation data

More information needed

## Training procedure

### Training hyperparameters

The following hyperparameters were used during training:
- learning_rate: 8e-05
- train_batch_size: 8
- seed: 42
- gradient_accumulation_steps: 4
- total_train_batch_size: 32
- optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
- lr_scheduler_type: linear
- lr_scheduler_warmup_ratio: 0.1
- num_epochs: 6.0

### Generate hyperparameters

The following hyperparameters were used during generation:
- dataset_name: scientific_papers
- dataset_config_name: arxiv
- eval_batch_size: 8
- eval_samples: 6440
- early_stopping: True
- ignore_pad_token_for_loss: True
- length_penalty: 2.0
- max_length: 320
- min_length: 32
- num_beams: 5
- no_repeat_ngram_size: None
- seed: 123

### Framework versions

- Transformers 4.18.0
- Pytorch 1.10.1+cu102
- Datasets 2.1.0
- Tokenizers 0.11.6