Improve model card
Browse filesThis PR improves the model card by adding a more detailed model description and information about the SemViQA system from the Github README.
README.md
CHANGED
@@ -2,19 +2,27 @@
|
|
2 |
language:
|
3 |
- vi
|
4 |
library_name: transformers
|
|
|
|
|
5 |
tags:
|
6 |
- SemViQA
|
7 |
- three-class-classification
|
8 |
- fact-checking
|
9 |
-
pipeline_tag: text-classification
|
10 |
-
license: mit
|
11 |
---
|
12 |
|
13 |
# SemViQA-TC: Vietnamese Three-class Classification for Claim Verification
|
14 |
|
15 |
## Model Description
|
16 |
|
17 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
18 |
|
19 |
### **Model Information**
|
20 |
- **Developed by:** [SemViQA Research Team](https://huggingface.co/SemViQA)
|
@@ -25,6 +33,15 @@ license: mit
|
|
25 |
|
26 |
SemViQA-TC serves as the **first step in the two-step classification process** of the SemViQA system. It initially categorizes claims into three classes: **SUPPORTED, REFUTED, or NEI**. For claims classified as **SUPPORTED** or **REFUTED**, a secondary **binary classification model (SemViQA-BC)** further refines the prediction. This hierarchical classification strategy enhances the accuracy of fact verification.
|
27 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
28 |
## Usage Example
|
29 |
|
30 |
Direct Model Usage
|
|
|
2 |
language:
|
3 |
- vi
|
4 |
library_name: transformers
|
5 |
+
license: mit
|
6 |
+
pipeline_tag: text-classification
|
7 |
tags:
|
8 |
- SemViQA
|
9 |
- three-class-classification
|
10 |
- fact-checking
|
|
|
|
|
11 |
---
|
12 |
|
13 |
# SemViQA-TC: Vietnamese Three-class Classification for Claim Verification
|
14 |
|
15 |
## Model Description
|
16 |
|
17 |
+
The rise of misinformation, exacerbated by Large Language Models (LLMs) like GPT and Gemini, demands robust fact-checking solutions, especially for low-resource languages like Vietnamese. Existing methods struggle with semantic ambiguity, homonyms, and complex linguistic structures, often trading accuracy for efficiency. We introduce SemViQA, a novel Vietnamese fact-checking framework integrating Semantic-based Evidence Retrieval (SER) and Two-step Verdict Classification (TVC). Our approach balances precision and speed, achieving state-of-the-art results with 78.97\% strict accuracy on ISE-DSC01 and 80.82\% on ViWikiFC, securing 1st place in the UIT Data Science Challenge. Additionally, SemViQA Faster improves inference speed 7x while maintaining competitive accuracy. SemViQA sets a new benchmark for Vietnamese fact verification, advancing the fight against misinformation.
|
18 |
+
|
19 |
+
**SemViQA-TC** is one of the key components of the **SemViQA** system, designed for **three-class classification** in Vietnamese fact-checking. This model classifies a given claim into one of three categories: **SUPPORTED**, **REFUTED**, or **NOT ENOUGH INFORMATION (NEI)** based on retrieved evidence. To address these challenges, SemViQA integrates:
|
20 |
+
|
21 |
+
- **Semantic-based Evidence Retrieval (SER)**: Combines **TF-IDF** with a **Question Answering Token Classifier (QATC)** to enhance retrieval precision while reducing inference time.
|
22 |
+
- **Two-step Verdict Classification (TVC)**: Uses hierarchical classification optimized with **Cross-Entropy and Focal Loss**, improving claim verification across three categories:
|
23 |
+
- **Supported** β
|
24 |
+
- **Refuted** β
|
25 |
+
- **Not Enough Information (NEI)** π€·ββοΈ
|
26 |
|
27 |
### **Model Information**
|
28 |
- **Developed by:** [SemViQA Research Team](https://huggingface.co/SemViQA)
|
|
|
33 |
|
34 |
SemViQA-TC serves as the **first step in the two-step classification process** of the SemViQA system. It initially categorizes claims into three classes: **SUPPORTED, REFUTED, or NEI**. For claims classified as **SUPPORTED** or **REFUTED**, a secondary **binary classification model (SemViQA-BC)** further refines the prediction. This hierarchical classification strategy enhances the accuracy of fact verification.
|
35 |
|
36 |
+
### **π Achievements**
|
37 |
+
- **1st place** in the **UIT Data Science Challenge** π
|
38 |
+
- **State-of-the-art** performance on:
|
39 |
+
- **ISE-DSC01** β **78.97% strict accuracy**
|
40 |
+
- **ViWikiFC** β **80.82% strict accuracy**
|
41 |
+
- **SemViQA Faster**: **7x speed improvement** over the standard model π
|
42 |
+
|
43 |
+
These results establish **SemViQA** as a **benchmark for Vietnamese fact verification**, advancing efforts to combat misinformation and ensure **information integrity**.
|
44 |
+
|
45 |
## Usage Example
|
46 |
|
47 |
Direct Model Usage
|