Karim-Gamal commited on
Commit
bbf3615
1 Parent(s): 0f9b882

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +73 -1
README.md CHANGED
@@ -7,4 +7,76 @@ language:
7
  metrics:
8
  - roc_auc
9
  pipeline_tag: text-classification
10
- ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
7
  metrics:
8
  - roc_auc
9
  pipeline_tag: text-classification
10
+ ---
11
+
12
+ # Pretrained Language Model for Fake News Detection
13
+
14
+ > This repository contains a pretrained language model for fake news detection. The model was developed using PyTorch and the Hugging Face Transformers library, and was fine-tuned on a dataset of news articles to classify each article as either "fake" or "Satire".
15
+
16
+
17
+ # Usage
18
+
19
+ To use the pretrained model for fake news detection, you can follow these steps:
20
+
21
+ > 1- Install the required dependencies, including PyTorch, Transformers, and scikit-learn.
22
+ >
23
+ > 2- Load the pretrained model using the `from_pretrained()` method in the Transformers library.
24
+ >
25
+ > 3- Tokenize your input text using the `tokenizer.encode_plus()` method.
26
+ >
27
+ > 4- Pass the tokenized input to the model's `forward()` method to get a prediction.
28
+
29
+
30
+ Here's an example code snippet that demonstrates how to use the model:
31
+
32
+ ```
33
+ from transformers import AutoTokenizer, AutoModelForSequenceClassification
34
+ import torch
35
+
36
+ # Load the pretrained model and tokenizer
37
+ model_name = "Karim-Gamal/Roberta_finetuned_fake_news_english.pt"
38
+ tokenizer = AutoTokenizer.from_pretrained(model_name)
39
+ model = AutoModelForSequenceClassification.from_pretrained(model_name)
40
+
41
+ # Tokenize the input text
42
+ input_text = "This is a fake news article"
43
+ inputs = tokenizer.encode_plus(input_text, padding=True, truncation=True, max_length=128, return_tensors="pt")
44
+
45
+ # Get the model's prediction
46
+ outputs = model(inputs["input_ids"], attention_mask=inputs["attention_mask"])
47
+ predictions = torch.softmax(outputs.logits, dim=1).detach().numpy()
48
+ ```
49
+
50
+ # Performance
51
+
52
+ > The model was evaluated on a test set of news articles, and achieved an AUC score of `95%`. This indicates that the model is able to effectively distinguish between fake and real news articles.
53
+
54
+
55
+ > |Model name|Roc Auc|
56
+ > |:---:|:---:|
57
+ > |jy46604790/Fake-News-Bert-Detect|88 %|
58
+ > |ghanashyamvtatti/roberta-fake-news|95 %|
59
+ > |gpt2 / reward_model|82 %|
60
+ > |gpt2 / imdb-sentiment-classifier|79 %|
61
+ > |microsoft/Multilingual-MiniLM-L12-H384|86 %|
62
+ > |hamzab/roberta-fake-news-classification|84 %|
63
+ > |mainuliitkgp/ROBERTa_fake_news_classification|86 %|
64
+ > |ghanashyamvtatti/roberta-fake-news after cleaning|82 %|
65
+
66
+
67
+
68
+ > Based on the provided results, it seems like the `ghanashyamvtatti/roberta-fake-news` model performed the best with a ROC AUC of `95%`. This model was specifically designed for detecting fake news, which explains its high performance on this task.
69
+
70
+ > Additionally, the `microsoft/Multilingual-MiniLM-L12-H384` model had a respectable performance with a ROC AUC of `86%` while being a lightweight model. Therefore, it was used in [our paper which is ( Federated Learning Based Multilingual Emoji prediction )](https://github.com/kareemgamalmahmoud/FEDERATED-LEARNING-BASED-MULTILINGUAL-EMOJI-PREDICTION-IN-CLEAN-AND-ATTACK-SCENARIOS) despite having slightly lower performance than other models.
71
+
72
+ > On the other hand, the `GPT2` models (`gpt2/reward_model` and `gpt2/imdb-sentiment-classifier`) had lower performance compared to the other models. This may be due to the fact that `GPT2` models were pre-trained on different tasks and not specifically designed for fake news detection.
73
+
74
+ >It is worth noting that even though the `ghanashyamvtatti/roberta-fake-news after cleaning` model had a lower performance (`82%`) than the original `ghanashyamvtatti/roberta-fake-news` model (`95%`), it might still be useful in certain scenarios, especially after cleaning the data.
75
+
76
+ Finally, it is important to test the performance of the selected model after loading it from Huggingface to make sure it is functioning properly in the desired environment.
77
+
78
+
79
+
80
+ # Model Card
81
+
82
+ > For more information about the model's architecture, [The original model link](https://huggingface.co/ghanashyamvtatti/roberta-fake-news)