---
language: 
- en
tags:
- bert
- text-classification
- advertisements
license: apache-2.0
datasets:
- custom
---

## Kaleemullah/bert-base-uncased-ad-nonad-classifier

### Model Description
This model is a fine-tuned version of `bert-base-uncased`, specifically tailored for distinguishing between advertising (ad) and non-advertising (non-ad) text content. It is designed to understand the nuances and language patterns that differentiate promotional content from other types of text.

### Intended Use
- **Primary Use Case:** Text classification, specifically identifying whether a given piece of text is an advertisement or not.
- **Out-of-Scope Use Cases:** This model is not intended for understanding context beyond the binary classification of ads vs. non-ads. It should not be used for complex natural language understanding tasks like sentiment analysis, question-answering, etc.

### Training Data
The model was trained on a balanced dataset consisting of 40,000 examples, with 20,000 ads and 20,000 non-ads. Each text entry was preprocessed and tokenized using the BERT tokenizer.

### Training Procedure
- **Preprocessing:** Text entries were tokenized using `BertTokenizer` with a maximum length of 512 tokens.
- **Fine-Tuning:** The model was fine-tuned on the preprocessed data for 3 epochs using the Hugging Face `transformers` Trainer API.
- **Evaluation Metrics:** The model's performance was evaluated based on accuracy, precision, recall, and F1-score.

### Performance
The model achieved the following metrics on the test dataset:
- Accuracy: 99.71%
- Precision: 99.76%
- Recall: 99.67%
- F1-score: 99.72%


Note: this model meant to be update soon (it is overfitting on one Non-Ad Catagory (will be updated soon))


### How to Use
```python
from transformers import BertTokenizer, BertForSequenceClassification
import torch

model_name = "Kaleemullah/bert-base-uncased-ad-nonad-classifier"
tokenizer = BertTokenizer.from_pretrained(model_name)
model = BertForSequenceClassification.from_pretrained(model_name)

def predict(text):
    inputs = tokenizer(text, padding=True, truncation=True, max_length=512, return_tensors="pt")
    with torch.no_grad():
        outputs = model(**inputs)
    prediction = torch.argmax(outputs.logits, dim=-1).numpy()[0]
    return "Ad" if prediction == 1 else "Non-Ad"

# Example
predict("Your example text here")