Check out the configuration reference at https://huggingface.co/docs/hub/spaces-config-reference
Model Card for AI Content Classification Model Description This model classifies text into one of three categories:
Human-Written AI-Generated Paraphrased It leverages the vai0511/ai-content-classifier model, which is based on state-of-the-art NLP techniques and trained on diverse datasets for accurate content identification.
Uses Direct Use Detecting AI-generated content Identifying paraphrased text Assisting in content moderation Out-of-Scope Use β Not suitable for legal or forensic content verification. β Should not be used as the sole basis for plagiarism detection.
Limitations & Biases β Potential Bias β The model is trained on a limited dataset, which may not generalize well across all writing styles and languages. β False Positives/Negatives β AI-generated or paraphrased text may be misclassified. β Adversarial Attacks β Text with subtle modifications may bypass detection.
Recommendation: Use this model as an assistive tool rather than a definitive classifier. Always verify results manually.
How to Use Install dependencies:
bash Copy Edit pip install transformers torch Load the model:
python Copy Edit from transformers import AutoModelForSequenceClassification, AutoTokenizer import torch
model_name = "vai0511/ai-content-classifier" model = AutoModelForSequenceClassification.from_pretrained(model_name) tokenizer = AutoTokenizer.from_pretrained(model_name)
def classify_text(text): inputs = tokenizer(text, return_tensors="pt", truncation=True, padding=True, max_length=512) with torch.no_grad(): outputs = model(**inputs) predicted_class = torch.argmax(outputs.logits, dim=1).item() labels = {0: "Human-Written", 1: "AI-Generated", 2: "Paraphrased"} return labels[predicted_class]
print(classify_text("This is an example text.")) Training Details Base Model: ELECTRA Dataset: 46,181 text samples Batch Size: 8 - 16 Epochs: 3 Learning Rate: 2e-5 - 3e-5 Optimizer: AdamW Max Token Length: 512 Preprocessing:
Removed duplicates, special characters, and excessive whitespace. Tokenization performed using Hugging Faceβs AutoTokenizer. License & Attribution This model is built upon vai0511/ai-content-classifier, which is licensed under Apache 2.0.
π Original Model: vai0511/ai-content-classifier π License Details: Apache 2.0 License
Disclaimer This model is intended for research and educational purposes. It may not always produce accurate results, and users should manually verify its classifications before making critical decisions.
- Downloads last month
- 6