visolex
/

bartpho-spam-binary

@@ -1,132 +0,0 @@
----
-license: apache-2.0
-base_model: vinai/bartpho-syllable
-tags:
-- vietnamese
-- spam-detection
-- text-classification
-- e-commerce
-datasets:
-- ViSpamReviews
-metrics:
-- accuracy
-- macro-f1
-- macro-precision
-- macro-recall
-model-index:
-- name: bartpho-spam-binary
-  results:
-  - task:
-      type: text-classification
-      name: Spam Review Detection
-    dataset:
-      name: ViSpamReviews
-      type: ViSpamReviews
-    metrics:
-      - type: accuracy
-        value: 0.8751
-      - type: macro-f1
-        value: 0.8358
----
-# bartpho-spam-binary: Spam Review Detection for Vietnamese Text
-This model is a fine-tuned version of [vinai/bartpho-syllable](https://huggingface.co/vinai/bartpho-syllable) on the **ViSpamReviews** dataset for spam review detection in Vietnamese e-commerce reviews.
-## Model Details
-* **Base Model**: `vinai/bartpho-syllable`
-* **Description**: BART Pho - Vietnamese BART model
-* **Dataset**: ViSpamReviews (Vietnamese Spam Review Dataset)
-* **Fine-tuning Framework**: HuggingFace Transformers
-* **Task**: Spam Review Detection (binary)
-* **Number of Classes**: 2
-### Hyperparameters
-* Max sequence length: `256`
-* Learning rate: `5e-5`
-* Batch size: `32`
-* Epochs: `100`
-* Early stopping patience: `5`
-## Dataset
-The model was trained on the **ViSpamReviews** dataset, which contains 19,860 Vietnamese e-commerce review samples. The dataset includes:
-* **Train set**: 14,299 samples (72%)
-* **Validation set**: 1,590 samples (8%)
-* **Test set**: 3,971 samples (20%)
-### Label Distribution
-* **Non-spam** (0): Genuine product reviews
-* **Spam** (1): Fake or promotional reviews
-## Results
-The model was evaluated on the test set with the following metrics:
-* **Accuracy**: `0.8751`
-* **Macro-F1**: `0.8358`
-## Usage
-You can use this model for spam review detection in Vietnamese text. Below is an example:
-```python
-from transformers import AutoTokenizer, AutoModelForSequenceClassification
-import torch
-# Load model and tokenizer
-model_name = "visolex/bartpho-spam-binary"
-tokenizer = AutoTokenizer.from_pretrained(model_name)
-model = AutoModelForSequenceClassification.from_pretrained(model_name)
-# Example review text
-text = "Sản phẩm này rất tốt, shop giao hàng nhanh!"
-# Tokenize
-inputs = tokenizer(text, return_tensors="pt", truncation=True, max_length=256)
-# Predict
-with torch.no_grad():
-    outputs = model(**inputs)
-    predicted_class = outputs.logits.argmax(dim=-1).item()
-    probabilities = torch.softmax(outputs.logits, dim=-1)
-# Map to label
-label_map = {0: "Non-spam", 1: "Spam"}
-predicted_label = label_map[predicted_class]
-confidence = probabilities[0][predicted_class].item()
-print(f"Text: {text}")
-print(f"Predicted: {predicted_label} (confidence: {confidence:.2%})")
-```
-## Citation
-If you use this model, please cite:
-```bibtex
-@misc{{
-  {model_key}_spam_detection,
-  title={{{description}}},
-  author={{ViSoLex Team}},
-  year={{2025}},
-  howpublished={{\url{{https://huggingface.co/{visolex/bartpho-spam-binary}}}}}
-}}
-```
-## License
-This model is released under the Apache-2.0 license.
-## Acknowledgments
-* Base model: [{base_model}](https://huggingface.co/{base_model})
-* Dataset: ViSpamReviews (Vietnamese Spam Review Dataset)
-* ViSoLex Toolkit for Vietnamese NLP