tanikina commited on
Commit
f708205
1 Parent(s): bf1ba7e

add model description

Browse files
Files changed (1) hide show
  1. README.md +11 -0
README.md ADDED
@@ -0,0 +1,11 @@
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ language:
3
+ - en
4
+ base_model:
5
+ - allenai/longformer-large-4096
6
+ ---
7
+ This is the fine-tuned version of the `longformer-large-4096` model additionally pre-trained on the S2ORC corpus [(Lo et al., 2020)](https://arxiv.org/pdf/1911.02782), which is a large corpus of 81.1M English-language academic papers from different disciplines. This model uses the weights of [the longformer large science checkpoint](https://github.com/dwadden/multivers/blob/main/script/get_checkpoint.py) that was used as the starting point for training the MultiVerS model [(Wadden et al., 2022)](https://arxiv.org/pdf/2112.01640) on the task of scientific claim verification.
8
+
9
+ Note that the vocabulary size of this model (50275) differs from the original `longformer-large-4096` (50265) since 10 new tokens were included:
10
+
11
+ `<|par|>, </|title|>, </|sec|>, <|sec-title|>, <|sent|>, <|title|>, <|abs|>, <|sec|>, </|sec-title|>, </|abs|>`.