Hitesh1501
commited on
Commit
•
314bbe2
1
Parent(s):
09f2aad
Update README.md
Browse files
README.md
CHANGED
@@ -6,30 +6,51 @@ license: apache-2.0
|
|
6 |
datasets:
|
7 |
- bookcorpus
|
8 |
- wikipedia
|
|
|
9 |
---
|
10 |
|
11 |
# BERT base model (uncased)
|
12 |
|
13 |
-
|
14 |
-
|
15 |
-
[this
|
16 |
-
|
|
|
17 |
|
18 |
-
Disclaimer: The team releasing BERT did not write a model card for this model so this model card has been written by
|
19 |
-
the Hugging Face team.
|
20 |
|
21 |
## Model description
|
22 |
|
23 |
-
|
24 |
-
|
25 |
-
|
26 |
-
|
27 |
-
|
28 |
-
|
29 |
-
|
30 |
-
|
31 |
-
|
32 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
33 |
- Next sentence prediction (NSP): the models concatenates two masked sentences as inputs during pretraining. Sometimes
|
34 |
they correspond to sentences that were next to each other in the original text, sometimes not. The model then has to
|
35 |
predict if the two sentences were following each other or not.
|
@@ -248,4 +269,4 @@ Glue test results:
|
|
248 |
|
249 |
<a href="https://huggingface.co/exbert/?model=bert-base-uncased">
|
250 |
<img width="300px" src="https://cdn-media.huggingface.co/exbert/button.png">
|
251 |
-
</a>
|
|
|
6 |
datasets:
|
7 |
- bookcorpus
|
8 |
- wikipedia
|
9 |
+
- trivia_qa
|
10 |
---
|
11 |
|
12 |
# BERT base model (uncased)
|
13 |
|
14 |
+
longformer-base-4096 is a BERT-like model started from the RoBERTa checkpoint and pretrained for MLM on long documents. It supports sequences of length up to 4,096.
|
15 |
+
It was introduced in
|
16 |
+
[this paper](https://arxiv.org/abs/2004.05150) and first released in
|
17 |
+
[this repository](https://github.com/allenai/longformer). Longformer uses a combination of a sliding window (local) attention and global attention.
|
18 |
+
Global attention is user-configured based on the task to allow the model to learn task-specific representations.
|
19 |
|
|
|
|
|
20 |
|
21 |
## Model description
|
22 |
|
23 |
+
Transformer-based models are unable to process long sequences due to their self-attention operation, which scales quadratically with the sequence length.
|
24 |
+
Longformer-Encoder-Decoder (LED), a Longformer variant for supporting long document generative sequence-to-sequence tasks,
|
25 |
+
and demonstrate its effectiveness on the arXiv summarization dataset.
|
26 |
+
|
27 |
+
- "Transformer-based models are unable to pro-
|
28 |
+
cess long sequences due to their self-attention
|
29 |
+
operation, which scales quadratically with the
|
30 |
+
sequence length. To address this limitation,
|
31 |
+
we introduce the Longformer with an attention
|
32 |
+
mechanism that scales linearly with sequence
|
33 |
+
length, making it easy to process documents of
|
34 |
+
thousands of tokens or longer. Longformer’s
|
35 |
+
attention mechanism is a drop-in replacement
|
36 |
+
for the standard self-attention and combines
|
37 |
+
a local windowed attention with a task moti-
|
38 |
+
vated global attention. Following prior work
|
39 |
+
on long-sequence transformers, we evaluate
|
40 |
+
Longformer on character-level language mod-
|
41 |
+
eling and achieve state-of-the-art results on
|
42 |
+
text8 and enwik8. In contrast to most
|
43 |
+
prior work, we also pretrain Longformer and
|
44 |
+
finetune it on a variety of downstream tasks.
|
45 |
+
Our pretrained Longformer consistently out-
|
46 |
+
performs RoBERTa on long document tasks
|
47 |
+
and sets new state-of-the-art results on Wiki-
|
48 |
+
Hop and TriviaQA. We finally introduce the
|
49 |
+
Longformer-Encoder-Decoder (LED), a Long-
|
50 |
+
former variant for supporting long document
|
51 |
+
generative sequence-to-sequence tasks, and
|
52 |
+
demonstrate its effectiveness on the arXiv sum-
|
53 |
+
marization dataset."
|
54 |
- Next sentence prediction (NSP): the models concatenates two masked sentences as inputs during pretraining. Sometimes
|
55 |
they correspond to sentences that were next to each other in the original text, sometimes not. The model then has to
|
56 |
predict if the two sentences were following each other or not.
|
|
|
269 |
|
270 |
<a href="https://huggingface.co/exbert/?model=bert-base-uncased">
|
271 |
<img width="300px" src="https://cdn-media.huggingface.co/exbert/button.png">
|
272 |
+
</a>
|