1
---
2
language: ISO 639-1 code for your language, or `multilingual`
3
thumbnail: url to a thumbnail used in social sharing
4
tags:
5
- array
6
- of
7
- tags
8
datasets:
9
- array of dataset identifiers
10
metrics:
11
- array of metric identifiers
12
widget:
13
- text: Plagiarism is the representation of another author's writing, thoughts, ideas,
14
    or expressions as one's own work.
15
---
16
17
# Longformer-base for Word Sense Disambiguation
18
19
This is the checkpoint for Longformer-base after being trained on the [Machine-Paraphrased Plagiarism Dataset](https://doi.org/10.5281/zenodo.3608000)
20
21
Additional information about this model:
22
23
* [The longformer-base-4096 model page](https://huggingface.co/allenai/longformer-base-4096)
24
* [Longformer: The Long-Document Transformer](https://arxiv.org/pdf/2004.05150.pdf)
25
* [Official implementation by AllenAI](https://github.com/allenai/longformer)
26
27
The model can be loaded to perform Plagiarism like so:
28
29
```py
30
from transformers import AutoModelForSequenceClassification, AutoTokenizer
31
32
AutoModelForSequenceClassification("jpelhaw/longformer-base-plagiarism-detection")
33
AutoTokenizer.from_pretrained("jpelhaw/longformer-base-plagiarism-detection")
34
35
input = 'Plagiarism is the representation of another author's writing, thoughts, ideas, or expressions as one's own work.'
36
37
38
example = tokenizer.tokenize(input, add_special_tokens=True)
39
40
answer = model(**example)
41
                                
42
# "plagiarised"
43
```