Karim-Gamal
commited on
Commit
·
bbf3615
1
Parent(s):
0f9b882
Update README.md
Browse files
README.md
CHANGED
@@ -7,4 +7,76 @@ language:
|
|
7 |
metrics:
|
8 |
- roc_auc
|
9 |
pipeline_tag: text-classification
|
10 |
-
---
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
7 |
metrics:
|
8 |
- roc_auc
|
9 |
pipeline_tag: text-classification
|
10 |
+
---
|
11 |
+
|
12 |
+
# Pretrained Language Model for Fake News Detection
|
13 |
+
|
14 |
+
> This repository contains a pretrained language model for fake news detection. The model was developed using PyTorch and the Hugging Face Transformers library, and was fine-tuned on a dataset of news articles to classify each article as either "fake" or "Satire".
|
15 |
+
|
16 |
+
|
17 |
+
# Usage
|
18 |
+
|
19 |
+
To use the pretrained model for fake news detection, you can follow these steps:
|
20 |
+
|
21 |
+
> 1- Install the required dependencies, including PyTorch, Transformers, and scikit-learn.
|
22 |
+
>
|
23 |
+
> 2- Load the pretrained model using the `from_pretrained()` method in the Transformers library.
|
24 |
+
>
|
25 |
+
> 3- Tokenize your input text using the `tokenizer.encode_plus()` method.
|
26 |
+
>
|
27 |
+
> 4- Pass the tokenized input to the model's `forward()` method to get a prediction.
|
28 |
+
|
29 |
+
|
30 |
+
Here's an example code snippet that demonstrates how to use the model:
|
31 |
+
|
32 |
+
```
|
33 |
+
from transformers import AutoTokenizer, AutoModelForSequenceClassification
|
34 |
+
import torch
|
35 |
+
|
36 |
+
# Load the pretrained model and tokenizer
|
37 |
+
model_name = "Karim-Gamal/Roberta_finetuned_fake_news_english.pt"
|
38 |
+
tokenizer = AutoTokenizer.from_pretrained(model_name)
|
39 |
+
model = AutoModelForSequenceClassification.from_pretrained(model_name)
|
40 |
+
|
41 |
+
# Tokenize the input text
|
42 |
+
input_text = "This is a fake news article"
|
43 |
+
inputs = tokenizer.encode_plus(input_text, padding=True, truncation=True, max_length=128, return_tensors="pt")
|
44 |
+
|
45 |
+
# Get the model's prediction
|
46 |
+
outputs = model(inputs["input_ids"], attention_mask=inputs["attention_mask"])
|
47 |
+
predictions = torch.softmax(outputs.logits, dim=1).detach().numpy()
|
48 |
+
```
|
49 |
+
|
50 |
+
# Performance
|
51 |
+
|
52 |
+
> The model was evaluated on a test set of news articles, and achieved an AUC score of `95%`. This indicates that the model is able to effectively distinguish between fake and real news articles.
|
53 |
+
|
54 |
+
|
55 |
+
> |Model name|Roc Auc|
|
56 |
+
> |:---:|:---:|
|
57 |
+
> |jy46604790/Fake-News-Bert-Detect|88 %|
|
58 |
+
> |ghanashyamvtatti/roberta-fake-news|95 %|
|
59 |
+
> |gpt2 / reward_model|82 %|
|
60 |
+
> |gpt2 / imdb-sentiment-classifier|79 %|
|
61 |
+
> |microsoft/Multilingual-MiniLM-L12-H384|86 %|
|
62 |
+
> |hamzab/roberta-fake-news-classification|84 %|
|
63 |
+
> |mainuliitkgp/ROBERTa_fake_news_classification|86 %|
|
64 |
+
> |ghanashyamvtatti/roberta-fake-news after cleaning|82 %|
|
65 |
+
|
66 |
+
|
67 |
+
|
68 |
+
> Based on the provided results, it seems like the `ghanashyamvtatti/roberta-fake-news` model performed the best with a ROC AUC of `95%`. This model was specifically designed for detecting fake news, which explains its high performance on this task.
|
69 |
+
|
70 |
+
> Additionally, the `microsoft/Multilingual-MiniLM-L12-H384` model had a respectable performance with a ROC AUC of `86%` while being a lightweight model. Therefore, it was used in [our paper which is ( Federated Learning Based Multilingual Emoji prediction )](https://github.com/kareemgamalmahmoud/FEDERATED-LEARNING-BASED-MULTILINGUAL-EMOJI-PREDICTION-IN-CLEAN-AND-ATTACK-SCENARIOS) despite having slightly lower performance than other models.
|
71 |
+
|
72 |
+
> On the other hand, the `GPT2` models (`gpt2/reward_model` and `gpt2/imdb-sentiment-classifier`) had lower performance compared to the other models. This may be due to the fact that `GPT2` models were pre-trained on different tasks and not specifically designed for fake news detection.
|
73 |
+
|
74 |
+
>It is worth noting that even though the `ghanashyamvtatti/roberta-fake-news after cleaning` model had a lower performance (`82%`) than the original `ghanashyamvtatti/roberta-fake-news` model (`95%`), it might still be useful in certain scenarios, especially after cleaning the data.
|
75 |
+
|
76 |
+
Finally, it is important to test the performance of the selected model after loading it from Huggingface to make sure it is functioning properly in the desired environment.
|
77 |
+
|
78 |
+
|
79 |
+
|
80 |
+
# Model Card
|
81 |
+
|
82 |
+
> For more information about the model's architecture, [The original model link](https://huggingface.co/ghanashyamvtatti/roberta-fake-news)
|