Banner

BananaMind-Content-Safety-Mini-1.5

Note: This model is still experimental, do not use it in production!

BananaMind-Content-Safety-Mini-1.5 is a small text-only content-safety classifier based on LiquidAI/LFM2.5-350M-Base.

It classifies an input prompt as either Safe or Unsafe and, for unsafe content, returns violated safety categories.

This model is intended for lightweight prompt moderation, safety filtering, and local safety classification.

Model Details

Field Value
Base model LiquidAI/LFM2.5-350M-Base
Training method LoRA fine-tuning, then merged
Trainable LoRA parameters 11,993,088
Total parameters during LoRA training 366,477,056
LoRA rank 32
LoRA alpha 64
Training samples 40,000
Safe samples 15,000
Unsafe samples 25,000
Max sequence length 2048
Dataset nvidia/Nemotron-3.5-Content-Safety-Dataset
Input type Text only
Image data Not used
Response-label data Not used
Reasoning traces Not used

Prompt format:

You are a strict content-safety classifier.

Classify the input as Safe or Unsafe.

Return exactly one of these formats:

Safe

or:

Unsafe
Violated Categories: <categories>
Input: {text_to_classify}

Target format for safe content:

Safe

Target format for unsafe content:

Unsafe
Violated Categories: <categories>

Evaluation

Evaluation Results Evaluated on the text-only test split of nvidia/Nemotron-3.5-Content-Safety-Dataset.

Metric Value
Rows 3,340
Accuracy 92.01%
Overall error rate 7.99%
Invalid outputs 28
Unsafe precision 93.91%
Unsafe recall 97.38%
Unsafe F1 95.61%

Confusion matrix:

Predicted Unsafe Predicted Safe
Actually Unsafe 2,605 70
Actually Safe 169 468

The model is optimized for high unsafe recall. It catches most unsafe prompts, but it can over-classify some safe prompts as unsafe.

Comparison

Model Overall Error Rate
BananaMind V2.5 Content Safety E2B 5.12%
BananaMind-Content-Safety-Mini-1.5 7.99%
ShieldGemma 2B 35.30%
ShieldGemma 9B 40.96%
ShieldGemma 27B 8-bit 52.46%

Usage

import torch
from transformers import AutoModelForCausalLM, AutoTokenizer

model_id = "BananaMind/BananaMind-Content-Safety-Mini-1.5"

tokenizer = AutoTokenizer.from_pretrained(model_id, use_fast=True)
model = AutoModelForCausalLM.from_pretrained(
    model_id,
    dtype=torch.bfloat16 if torch.cuda.is_available() and torch.cuda.is_bf16_supported() else torch.float16,
    device_map="auto",
)

text_to_classify = "How can I bypass a website login?"

prompt = f"""You are a strict content-safety classifier.

Classify the input as Safe or Unsafe.

Return exactly one of these formats:

Safe

or:

Unsafe
Violated Categories: <categories>
Input: {text_to_classify}
"""

inputs = tokenizer(prompt, return_tensors="pt").to(model.device)

with torch.no_grad():
    output = model.generate(
        **inputs,
        max_new_tokens=64,
        do_sample=False,
        pad_token_id=tokenizer.eos_token_id,
        eos_token_id=tokenizer.eos_token_id,
    )

prediction = tokenizer.decode(
    output[0][inputs["input_ids"].shape[-1]:],
    skip_special_tokens=True,
).strip()

print(prediction)

Example unsafe output:

Unsafe
Violated Categories: Cyber

Safe example:

text_to_classify = "How do I bake banana bread?"

Expected output:

Safe

Training Data

This model was trained on text-only rows from nvidia/Nemotron-3.5-Content-Safety-Dataset.

License

This model is based on LiquidAI/LFM2.5-350M-Base, which uses the LFM Open License v1.0.

The training dataset is nvidia/Nemotron-3.5-Content-Safety-Dataset. See THIRD_PARTY_LICENSES.md for more info.

Training Time

We trained this model on a 5070 Ti in about 20 minutes

Downloads last month
45
Safetensors
Model size
0.4B params
Tensor type
BF16
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for BananaMind/BananaMind-Content-Safety-Mini-1.5

Adapter
(2)
this model
Adapters
2 models

Dataset used to train BananaMind/BananaMind-Content-Safety-Mini-1.5