|
--- |
|
license: mit |
|
datasets: |
|
- SetFit/enron_spam |
|
metrics: |
|
- accuracy |
|
library_name: transformers |
|
pipeline_tag: text-classification |
|
tags: |
|
- email |
|
- multilingual |
|
--- |
|
|
|
# XLM-RoBERTa for multilingual spam detection |
|
|
|
I trained this model to detect spam in german as there is no german labeled spam mail dataset, and I could not find an already pretrained multilingual model for the enron spam dataset. |
|
|
|
## Intended use |
|
Identifying spam mail in any XLM-RoBERTa-supported language. |
|
Note that there was no thorough testing on it's intended use - only validation on the enron mail dataset. |
|
|
|
## Evaluation |
|
|
|
Eval on test set of enron spam: |
|
|
|
- loss: 0.0315 |
|
- accuracy: 0.996 |