add model description
Browse files
README.md
ADDED
@@ -0,0 +1,11 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
---
|
2 |
+
language:
|
3 |
+
- en
|
4 |
+
base_model:
|
5 |
+
- allenai/longformer-large-4096
|
6 |
+
---
|
7 |
+
This is the fine-tuned version of the `longformer-large-4096` model additionally pre-trained on the S2ORC corpus [(Lo et al., 2020)](https://arxiv.org/pdf/1911.02782), which is a large corpus of 81.1M English-language academic papers from different disciplines. This model uses the weights of [the longformer large science checkpoint](https://github.com/dwadden/multivers/blob/main/script/get_checkpoint.py) that was used as the starting point for training the MultiVerS model [(Wadden et al., 2022)](https://arxiv.org/pdf/2112.01640) on the task of scientific claim verification.
|
8 |
+
|
9 |
+
Note that the vocabulary size of this model (50275) differs from the original `longformer-large-4096` (50265) since 10 new tokens were included:
|
10 |
+
|
11 |
+
`<|par|>, </|title|>, </|sec|>, <|sec-title|>, <|sent|>, <|title|>, <|abs|>, <|sec|>, </|sec-title|>, </|abs|>`.
|