google/jigsaw_toxicity_pred
Updated β’ 928 β’ 34
NLP Course Project β Individual Assignment
Model: Fine-tuned roberta-base | Task: 6-class Multi-label Classification
Dataset: google/jigsaw_toxicity_pred (~150K samples)
| Resource | URL |
|---|---|
| π₯οΈ Interactive UI (HuggingFace Spaces) | https://huggingface.co/spaces/Pommu/threat-detection-jigsaw |
| π€ Fine-tuned Model | https://huggingface.co/Pommu/threat-detection-jigsaw |
| π Local API | http://localhost:8000/predict (POST) |
Modern organizations face increasing risks from implicit threats and workplace harassment. Unlike simple slur-detectors, this project focuses on intent-based detection. It uses the Jigsaw Toxic Comment dataset to identify professional and personal risk across 6 binary categories:
| Label | Description | Risk Level |
|---|---|---|
Threatening |
Violent intent, intimidation, physical danger | π΄ HIGH |
Hate Speech |
Identity-based attacks, protected groups | π΄ HIGH |
Highly Severe |
Extreme toxicity, highly disruptive | π΄ HIGH |
Toxic |
Rude, disrespectful, unprofessional | π‘ MEDIUM |
Insult |
Personal attacks, non-violent harassment | π‘ MEDIUM |
Profanity |
Obscene language, compliance violation | π‘ LOW |
[User Input: Text / Chat Transcript]
β
βΌ
[Gradio UI] ββ HTTP POST /predict βββΆ [FastAPI Backend]
β
ββββββββββββββ΄βββββββββββββ
β β
[Layer 1] [Layer 2]
Fine-tuned RoBERTa Lexical Booster
(6 Jigsaw labels) (keyword pattern match)
β β
ββββββββββββββ¬βββββββββββββ
β
{label, confidence, risk_level}
threat class (0.3% of data) to maximize recall.Design Choice: We prioritize Recall over Precision for threats. A false positive (flagging safe text) is far less costly than a false negative (missing a real threat).
# Test implicit threat detection locally
curl -X POST http://localhost:8000/predict \
-H "Content-Type: application/json" \
-d '{"text": "I will find you and you will regret every decision you have made."}'
JSON Response:
{
"prediction": "Threatening",
"confidence": 0.85,
"is_threat": true,
"risk_level": "HIGH",
"all_scores": [
{"label": "Toxic", "confidence": 0.0009},
{"label": "Threatening", "confidence": 0.85},
...
]
}
NLP Course Project/
βββ notebook/
β βββ train_3.ipynb β Colab training notebook (WeightedTrainer)
βββ app/
β βββ main.py β FastAPI backend + Lexical Booster
β βββ model_loader.py β Singleton model loader (thread-safe)
β βββ schemas.py β Pydantic request/response models
β βββ demo.py β Gradio UI (standalone OR API-backed)
βββ spaces/
β βββ app.py β HuggingFace Spaces entry point
β βββ requirements.txt
βββ requirements.txt β API/demo dependencies
βββ requirements_training.txt β Training dependencies (Colab)
βββ setup_local.bat β Windows one-click environment setup
βββ README.md
# Option A: one-click setup
setup_local.bat
# Option B: manual
python -m venv .venv
.venv\Scripts\activate
pip install -r requirements.txt
Set your .env file:
HF_TOKEN=your_token_here
MODEL_NAME=Pommu/threat-detection-jigsaw
# Terminal 1 β Start the FastAPI backend
uvicorn app.main:app --reload
# Terminal 2 β Launch the Gradio demo
python app/demo.py
Open http://localhost:7860 to see the UI.
notebook/train_3.ipynb in Google ColabHF_Token to Colab Secrets (π icon)MODEL_NAME = Pommu/threat-detection-jigsawspaces/ contents to the Space repo| Component | Technology |
|---|---|
| Model | roberta-base (HuggingFace Transformers) |
| Training | HuggingFace Trainer API + WeightedTrainer (custom) |
| Dataset | google/jigsaw_toxicity_pred (150K samples, 6 labels) |
| Backend API | FastAPI + Uvicorn |
| UI | Gradio |
| Deployment | HuggingFace Spaces |
| Evaluation | scikit-learn (Precision/Recall/F1/Confusion Matrix) |