|
--- |
|
language: "ISO 639-1 code for your language, or `multilingual`" |
|
thumbnail: "url to a thumbnail used in social sharing" |
|
tags: |
|
- array |
|
- of |
|
- tags |
|
license: "any valid license identifier" |
|
datasets: |
|
- array of dataset identifiers |
|
metrics: |
|
- array of metric identifiers |
|
widget: |
|
- text: "Plagiarism is the representation of another author's writing, thoughts, ideas, or expressions as one's own work." |
|
--- |
|
|
|
# T5-large for Word Sense Disambiguation |
|
|
|
This is the checkpoint for T5-large after being trained on the Machine-Paraphrased Plagiarism Dataset: [![DOI](https://zenodo.org/badge/DOI/10.5281/zenodo.3608000.svg)](https://doi.org/10.5281/zenodo.3608000) |
|
|
|
Additional information about this model: |
|
|
|
* [The longformer-base-4096 model page](https://huggingface.co/allenai/longformer-base-4096) |
|
* [Longformer: The Long-Document Transformer](https://arxiv.org/pdf/2004.05150.pdf) |
|
* [Official implementation by AllenAI](https://github.com/allenai/longformer) |
|
|
|
The model can be loaded to perform Plagiarism like so: |
|
|
|
```py |
|
from transformers import AutoModelForSequenceClassification, AutoTokenizer |
|
|
|
AutoModelForSequenceClassification("jpelhaw/longformer-base-plagiarism-detection") |
|
AutoTokenizer.from_pretrained("jpelhaw/longformer-base-plagiarism-detection") |
|
|
|
input = 'Plagiarism is the representation of another author's writing, thoughts, ideas, or expressions as one's own work.' |
|
|
|
|
|
example = tokenizer.tokenize(input, add_special_tokens=True) |
|
|
|
answer = model(**example) |
|
|
|
# "plagiarised" |
|
``` |