Kaleemullah/bert-base-uncased-ad-nonad-classifer

Kaleemullah/bert-base-uncased-ad-nonad-classifier

Model Description

This model is a fine-tuned version of bert-base-uncased, specifically tailored for distinguishing between advertising (ad) and non-advertising (non-ad) text content. It is designed to understand the nuances and language patterns that differentiate promotional content from other types of text.

Intended Use

Primary Use Case: Text classification, specifically identifying whether a given piece of text is an advertisement or not.
Out-of-Scope Use Cases: This model is not intended for understanding context beyond the binary classification of ads vs. non-ads. It should not be used for complex natural language understanding tasks like sentiment analysis, question-answering, etc.

Training Data

The model was trained on a balanced dataset consisting of 40,000 examples, with 20,000 ads and 20,000 non-ads. Each text entry was preprocessed and tokenized using the BERT tokenizer.

Training Procedure

Preprocessing: Text entries were tokenized using BertTokenizer with a maximum length of 512 tokens.
Fine-Tuning: The model was fine-tuned on the preprocessed data for 3 epochs using the Hugging Face transformers Trainer API.
Evaluation Metrics: The model's performance was evaluated based on accuracy, precision, recall, and F1-score.

Performance

The model achieved the following metrics on the test dataset:

Accuracy: 99.71%
Precision: 99.76%
Recall: 99.67%
F1-score: 99.72%

Note: this model meant to be update soon (it is overfitting on one Non-Ad Catagory (will be updated soon))

How to Use

from transformers import BertTokenizer, BertForSequenceClassification
import torch

model_name = "Kaleemullah/bert-base-uncased-ad-nonad-classifier"
tokenizer = BertTokenizer.from_pretrained(model_name)
model = BertForSequenceClassification.from_pretrained(model_name)

def predict(text):
    inputs = tokenizer(text, padding=True, truncation=True, max_length=512, return_tensors="pt")
    with torch.no_grad():
        outputs = model(**inputs)
    prediction = torch.argmax(outputs.logits, dim=-1).numpy()[0]
    return "Ad" if prediction == 1 else "Non-Ad"

# Example
predict("Your example text here")