AndyChiang commited on
Commit
9d0d17d
1 Parent(s): 219a79d

Create README.md

Browse files
Files changed (1) hide show
  1. README.md +104 -0
README.md ADDED
@@ -0,0 +1,104 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ license: mit
3
+ language: en
4
+ tags:
5
+ - Pre-CoFactv3
6
+ - Text-Classification
7
+ datasets:
8
+ - FACTIFY5WQA
9
+ metrics:
10
+ - accuracy
11
+ pipeline_tag: text-classification
12
+ library_name: transformers
13
+ base_model: microsoft/deberta-v3-large
14
+ widget:
15
+ - text: "BREAKING: Another nearly 1.9 million Americans filed for unemployment insurance last week, the Department of Labor said. https://t.co/dVwyI6avmx [SEP] By Anneken Tappe, CNN BusinessUpdated 11:50 AM ET, Thu June 4, 2020 New York (CNN Business)Millions of Americans again filed for unemployment benefits last week, as the coronavirus recession drags on."
16
+ example_title: "Support"
17
+ - text: "Micah Richards spent an entire season at Aston Vila without playing a single game. [SEP] Despite speculation that Richards would leave Aston Villa before the transfer deadline for the 2018~19 season , he remained at the club , although he is not being considered for first team selection."
18
+ example_title: "Neutral"
19
+ - text: "Mahatma Gandhi having breakfast with British official inside the jail. [SEP] A photo is being shared on Facebook with a claim that Gandhi was having breakfast with British officials inside the jail while people are fighting for Independence. Let\u2019s try to check the authenticity of the image in the post. Claim: Mahatma Gandhi having breakfast with British official inside the jail. Fact: The photo was not taken inside the jail. It was taken during a breakfast meeting between Gandhi and Mountbatten at Viceroy\u2019s House in April 1947. Hence the claim made in the post is FALSE. When the image in the post is run Google Reverse Image Search, a link to Getty Images website containing the same image can be found in the search results. In that website, the image has a description which reads, \u201cBreakfast meeting between Mahatma Gandhi and Viceroy of India, Lord Mountbatten 1947\u201d. Also, in the book \u2018India Remembered\u2019 written by Pamela Mountbatten (the daughter of Lord Mountbatten), the same image can be found in the \u2018A Huge Task\u2019 chapter. She writes that the photo was taken on 1st April 1947 at the Viceroy\u2019s House. The Viceroy invited Gandhi for breakfast to discuss the transfer of power, declared by England\u2019s PM Clement R. Atlee in February 1947. So, the photo was not taken inside the jail. To sum it up, the photo was taken in April 1947 at the Viceroy\u2019s house, not inside the jail. Did you watch our Facebook live on Fake News (Misinformation)."
20
+ example_title: "Refute"
21
+ ---
22
+
23
+ # Pre-CoFactv3-Text-Classification
24
+
25
+ ## Model description
26
+
27
+ This is a Text Classification model for **AAAI 2024 Workshop Paper: “Team Trifecta at Factify5WQA: Setting the Standard in Fact Verification with Fine-Tuning”**
28
+
29
+ Its input are claim and evidence, and output is the predicted label, which falls into one of the categories: Support, Neutral, or Refute.
30
+
31
+ It is fine-tuned by **FACTIFY5WQA** dataset based on [**microsoft/deberta-v3-large**](https://huggingface.co/microsoft/deberta-v3-large) model.
32
+
33
+ For more details, you can see our **paper** or [**GitHub**](https://github.com/AndyChiangSH/Pre-CoFactv3).
34
+
35
+ ## How to use?
36
+
37
+ 1. Download the model by hugging face transformers.
38
+ ```python
39
+ from transformers import AutoModelForSequenceClassification, AutoTokenizer
40
+
41
+ model = AutoModelForSequenceClassification.from_pretrained("AndyChiang/Pre-CoFactv3-Text-Classification")
42
+ tokenizer = AutoTokenizer.from_pretrained("AndyChiang/Pre-CoFactv3-Text-Classification")
43
+ ```
44
+
45
+ 2. Create a pipeline.
46
+ ```python
47
+ classifier = pipeline("text-classification", model=model, tokenizer=tokenizer)
48
+ ```
49
+
50
+ 3. Use the pipeline to predict the label.
51
+ ```python
52
+ label = classifier("Micah Richards spent an entire season at Aston Vila without playing a single game. [SEP] Despite speculation that Richards would leave Aston Villa before the transfer deadline for the 2018~19 season , he remained at the club , although he is not being considered for first team selection.")
53
+ print(label)
54
+ ```
55
+
56
+ ## Dataset
57
+
58
+ We utilize the dataset FACTIFY5WQA provided by the AAAI-24 Workshop Factify 3.0.
59
+
60
+ This dataset is designed for fact verification, with the task of determining the veracity of a claim based on the given evidence.
61
+
62
+ - **claim:** the statement to be verified.
63
+ - **evidence:** the facts to verify the claim.
64
+ - **question:** the questions generated from the claim by the 5W framework (who, what, when, where, and why).
65
+ - **claim_answer:** the answers derived from the claim.
66
+ - **evidence_answer:** the answers derived from the evidence.
67
+ - **label:** the veracity of the claim based on the given evidence, which is one of three categories: Support, Neutral, or Refute.
68
+
69
+ | | Training | Validation | Testing | Total |
70
+ | --- | --- | --- | --- | --- |
71
+ | Support | 3500 | 750 | 750 | 5000 |
72
+ | Neutral | 3500 | 750 | 750 | 5000 |
73
+ | Refute | 3500 | 750 | 750 | 5000 |
74
+ | Total | 10500 | 2250 | 2250 | 15000 |
75
+
76
+ ## Fine-tuning
77
+
78
+ Fine-tuning is conducted by the Hugging Face Trainer API on the [Text Classification](https://huggingface.co/docs/transformers/tasks/sequence_classification) task.
79
+
80
+ ### Training hyperparameters
81
+
82
+ The following hyperparameters were used during training:
83
+
84
+ - Pre-train language model: [microsoft/deberta-v3-large](https://huggingface.co/microsoft/deberta-v3-large)
85
+ - Optimizer: adam
86
+ - Learning rate: 0.00001
87
+ - Max token of input: 650
88
+ - Batch size: 4
89
+ - Epoch: 12
90
+ - Device: NVIDIA RTX A5000
91
+
92
+ ## Testing
93
+
94
+ In the case of the Text Classification task, accuracy serves as the evaluation metric.
95
+
96
+ | Accuracy |
97
+ | ----- |
98
+ | 0.8502 |
99
+
100
+ ## Other models
101
+
102
+ [AndyChiang/Pre-CoFactv3-Question-Answering](https://huggingface.co/AndyChiang/Pre-CoFactv3-Question-Answering)
103
+
104
+ ## Citation