BananaMind-Content-Safety-Mini-1.5

Note: This model is still experimental, do not use it in production!

BananaMind-Content-Safety-Mini-1.5 is a small text-only content-safety classifier based on LiquidAI/LFM2.5-350M-Base.

It classifies an input prompt as either Safe or Unsafe and, for unsafe content, returns violated safety categories.

This model is intended for lightweight prompt moderation, safety filtering, and local safety classification.

Model Details

Field	Value
Base model	`LiquidAI/LFM2.5-350M-Base`
Training method	LoRA fine-tuning, then merged
Trainable LoRA parameters	11,993,088
Total parameters during LoRA training	366,477,056
LoRA rank	32
LoRA alpha	64
Training samples	40,000
Safe samples	15,000
Unsafe samples	25,000
Max sequence length	2048
Dataset	`nvidia/Nemotron-3.5-Content-Safety-Dataset`
Input type	Text only
Image data	Not used
Response-label data	Not used
Reasoning traces	Not used

Prompt format:

You are a strict content-safety classifier.

Classify the input as Safe or Unsafe.

Return exactly one of these formats:

Safe

or:

Unsafe
Violated Categories: <categories>
Input: {text_to_classify}

Target format for safe content:

Safe

Target format for unsafe content:

Unsafe
Violated Categories: <categories>

Evaluation

Evaluated on the text-only test split of nvidia/Nemotron-3.5-Content-Safety-Dataset.

Metric	Value
Rows	3,340
Accuracy	92.01%
Overall error rate	7.99%
Invalid outputs	28
Unsafe precision	93.91%
Unsafe recall	97.38%
Unsafe F1	95.61%

Confusion matrix:

	Predicted Unsafe	Predicted Safe
Actually Unsafe	2,605	70
Actually Safe	169	468

The model is optimized for high unsafe recall. It catches most unsafe prompts, but it can over-classify some safe prompts as unsafe.

Comparison

Model	Overall Error Rate
BananaMind V2.5 Content Safety E2B	5.12%
BananaMind-Content-Safety-Mini-1.5	7.99%
ShieldGemma 2B	35.30%
ShieldGemma 9B	40.96%
ShieldGemma 27B 8-bit	52.46%

Usage

import torch
from transformers import AutoModelForCausalLM, AutoTokenizer

model_id = "BananaMind/BananaMind-Content-Safety-Mini-1.5"

tokenizer = AutoTokenizer.from_pretrained(model_id, use_fast=True)
model = AutoModelForCausalLM.from_pretrained(
    model_id,
    dtype=torch.bfloat16 if torch.cuda.is_available() and torch.cuda.is_bf16_supported() else torch.float16,
    device_map="auto",
)

text_to_classify = "How can I bypass a website login?"

prompt = f"""You are a strict content-safety classifier.

Classify the input as Safe or Unsafe.

Return exactly one of these formats:

Safe

or:

Unsafe
Violated Categories: <categories>
Input: {text_to_classify}
"""

inputs = tokenizer(prompt, return_tensors="pt").to(model.device)

with torch.no_grad():
    output = model.generate(
        **inputs,
        max_new_tokens=64,
        do_sample=False,
        pad_token_id=tokenizer.eos_token_id,
        eos_token_id=tokenizer.eos_token_id,
    )

prediction = tokenizer.decode(
    output[0][inputs["input_ids"].shape[-1]:],
    skip_special_tokens=True,
).strip()

print(prediction)

Example unsafe output:

Unsafe
Violated Categories: Cyber

Safe example:

text_to_classify = "How do I bake banana bread?"

Expected output:

Safe

Training Data

This model was trained on text-only rows from nvidia/Nemotron-3.5-Content-Safety-Dataset.

License

This model is based on LiquidAI/LFM2.5-350M-Base, which uses the LFM Open License v1.0.

The training dataset is nvidia/Nemotron-3.5-Content-Safety-Dataset. See THIRD_PARTY_LICENSES.md for more info.

Training Time

We trained this model on a 5070 Ti in about 20 minutes

Downloads last month: 45

Safetensors

Model size

0.4B params

Tensor type

BF16

Model tree for BananaMind/BananaMind-Content-Safety-Mini-1.5

Base model

LiquidAI/LFM2.5-350M-Base

Adapter

(2)

this model

Adapters

2 models

BananaMind
/

BananaMind-Content-Safety-Mini-1.5