AnnyNguyen commited on
Commit
4afd984
·
verified ·
1 Parent(s): b4578d9

Delete README.md with huggingface_hub

Browse files
Files changed (1) hide show
  1. README.md +0 -132
README.md DELETED
@@ -1,132 +0,0 @@
1
- ---
2
- license: apache-2.0
3
- base_model: vinai/bartpho-syllable
4
- tags:
5
- - vietnamese
6
- - spam-detection
7
- - text-classification
8
- - e-commerce
9
- datasets:
10
- - ViSpamReviews
11
- metrics:
12
- - accuracy
13
- - macro-f1
14
- - macro-precision
15
- - macro-recall
16
- model-index:
17
- - name: bartpho-spam-binary
18
- results:
19
- - task:
20
- type: text-classification
21
- name: Spam Review Detection
22
- dataset:
23
- name: ViSpamReviews
24
- type: ViSpamReviews
25
- metrics:
26
- - type: accuracy
27
- value: 0.8751
28
- - type: macro-f1
29
- value: 0.8358
30
- ---
31
- # bartpho-spam-binary: Spam Review Detection for Vietnamese Text
32
-
33
- This model is a fine-tuned version of [vinai/bartpho-syllable](https://huggingface.co/vinai/bartpho-syllable) on the **ViSpamReviews** dataset for spam review detection in Vietnamese e-commerce reviews.
34
-
35
- ## Model Details
36
-
37
- * **Base Model**: `vinai/bartpho-syllable`
38
- * **Description**: BART Pho - Vietnamese BART model
39
- * **Dataset**: ViSpamReviews (Vietnamese Spam Review Dataset)
40
- * **Fine-tuning Framework**: HuggingFace Transformers
41
- * **Task**: Spam Review Detection (binary)
42
- * **Number of Classes**: 2
43
-
44
- ### Hyperparameters
45
-
46
- * Max sequence length: `256`
47
- * Learning rate: `5e-5`
48
- * Batch size: `32`
49
- * Epochs: `100`
50
- * Early stopping patience: `5`
51
-
52
- ## Dataset
53
-
54
- The model was trained on the **ViSpamReviews** dataset, which contains 19,860 Vietnamese e-commerce review samples. The dataset includes:
55
-
56
- * **Train set**: 14,299 samples (72%)
57
- * **Validation set**: 1,590 samples (8%)
58
- * **Test set**: 3,971 samples (20%)
59
-
60
- ### Label Distribution
61
-
62
-
63
- * **Non-spam** (0): Genuine product reviews
64
- * **Spam** (1): Fake or promotional reviews
65
-
66
- ## Results
67
-
68
- The model was evaluated on the test set with the following metrics:
69
-
70
- * **Accuracy**: `0.8751`
71
- * **Macro-F1**: `0.8358`
72
-
73
-
74
- ## Usage
75
-
76
- You can use this model for spam review detection in Vietnamese text. Below is an example:
77
-
78
- ```python
79
- from transformers import AutoTokenizer, AutoModelForSequenceClassification
80
- import torch
81
-
82
- # Load model and tokenizer
83
- model_name = "visolex/bartpho-spam-binary"
84
- tokenizer = AutoTokenizer.from_pretrained(model_name)
85
- model = AutoModelForSequenceClassification.from_pretrained(model_name)
86
-
87
- # Example review text
88
- text = "Sản phẩm này rất tốt, shop giao hàng nhanh!"
89
-
90
- # Tokenize
91
- inputs = tokenizer(text, return_tensors="pt", truncation=True, max_length=256)
92
-
93
- # Predict
94
- with torch.no_grad():
95
- outputs = model(**inputs)
96
- predicted_class = outputs.logits.argmax(dim=-1).item()
97
- probabilities = torch.softmax(outputs.logits, dim=-1)
98
-
99
-
100
- # Map to label
101
- label_map = {0: "Non-spam", 1: "Spam"}
102
- predicted_label = label_map[predicted_class]
103
- confidence = probabilities[0][predicted_class].item()
104
-
105
- print(f"Text: {text}")
106
- print(f"Predicted: {predicted_label} (confidence: {confidence:.2%})")
107
-
108
- ```
109
-
110
- ## Citation
111
-
112
- If you use this model, please cite:
113
-
114
- ```bibtex
115
- @misc{{
116
- {model_key}_spam_detection,
117
- title={{{description}}},
118
- author={{ViSoLex Team}},
119
- year={{2025}},
120
- howpublished={{\url{{https://huggingface.co/{visolex/bartpho-spam-binary}}}}}
121
- }}
122
- ```
123
-
124
- ## License
125
-
126
- This model is released under the Apache-2.0 license.
127
-
128
- ## Acknowledgments
129
-
130
- * Base model: [{base_model}](https://huggingface.co/{base_model})
131
- * Dataset: ViSpamReviews (Vietnamese Spam Review Dataset)
132
- * ViSoLex Toolkit for Vietnamese NLP