Heaven1-base-1b: Guardian - Predatory Behavior Detection Model

Heaven1 Guardian Banner

Model Description

Heaven1-base-1b (codename: "Guardian") is a fine-tuned version of Meta's Llama-3.2-1B-Instruct model, specifically optimized to detect and prevent harmful predatory patterns in conversations. This model was created using Parameter-Efficient Fine-Tuning (PEFT) with QLoRA techniques to enable training on consumer-grade hardware.

Model Details

  • Developed by: SafeCircleIA
  • Base model: Meta-Llama-3.2-1B-Instruct
  • Model type: Causal Language Model with LoRA adapters
  • Language: English
  • Training method: QLoRA fine-tuning (4-bit quantization)
  • License: MIT (subject to Llama 3.2 usage restrictions)

Uses

Direct Use

This model is designed for direct use in:

  • Detecting potentially harmful interactions in text messages
  • Classifying messages as predatory or safe with brief explanations
  • Assisting human moderators in identifying concerning patterns
  • Supporting research on digital safety

Out-of-Scope Use

This model should not be used for:

  • Making autonomous decisions about user safety without human review
  • Creating or refining predatory language patterns
  • As the sole determinant for any safety-critical applications
  • Any application without proper privacy considerations and consent

Bias, Risks, and Limitations

  • The model detects patterns based on its training data and may miss novel predatory tactics
  • Performance may vary across different cultural contexts and communication styles
  • False positives and false negatives are possible
  • Relies heavily on conversational patterns identified during training
  • Limited to English language text

Recommendations

  • Always combine with human review for best results
  • Consider cultural and contextual factors when interpreting results
  • Regularly evaluate the model's performance in your specific use case
  • Use low temperature settings (0.1-0.3) for more consistent classification results

How to Get Started with the Model

To run inference with this model:

python run_inference.py --use_4bit --model_path ./heaven1-base-1b --base_model meta-llama/Llama-3.2-1B-Instruct

Optional Parameters

  • --max_length (default: 512): Maximum sequence length
  • --temperature (default: 0.1): Controls randomness (lower = more deterministic classification)

Training Details

Training Data

The model was fine-tuned on a custom dataset of 10,000 examples, with approximately 50% containing examples of predatory behavior patterns. This balanced dataset ensures the model can effectively identify concerning patterns while maintaining normal conversation capabilities.

Training Hyperparameters

This model was trained with the following hyperparameters:

  • Learning rate: 2e-5
  • Epochs: 3
  • Batch size: 1
  • Gradient accumulation steps: 16
  • LoRA rank (r): 8
  • LoRA alpha: 16
  • LoRA dropout: 0.05
  • 4-bit quantization: Yes (NF4 format)
  • Max sequence length: 2048

Evaluation

Testing Data & Metrics

The model was evaluated on a held-out test set (10% of the dataset) with the following metrics:

  • Accuracy: Measures overall classification correctness
  • Precision: Measures how many identified predatory messages were actually predatory
  • Recall: Measures how many actual predatory messages were identified
  • F1 Score: Harmonic mean of precision and recall

Results

Evaluation metrics on test dataset:

Metric Score
Accuracy 93.8%
Precision 92.4%
Recall 95.1%
F1 93.7%

Environmental Impact

  • Hardware Type: Consumer GPU (NVIDIA RTX 2060, 6GB VRAM)
  • Hours used: Approximately 3 hours for training
  • Energy consumption: Minimal due to efficient QLoRA fine-tuning

Performance and Limitations

  • Hardware requirements: Can run on consumer GPUs with at least 6GB VRAM when used with 4-bit quantization
  • Sequence length: Optimized for sequences up to 2048 tokens
  • Limitations:
    • As with any AI model, it may occasionally miss subtle predatory patterns
    • False positives are possible in ambiguous situations
    • Performance depends on input context quality

Ethical Considerations

This model is designed to help identify and prevent potentially harmful predatory patterns in conversations. However, it should not be used as the sole determinant for making important decisions. Human oversight is essential when deploying this model in real-world applications.

  • Respect privacy and obtain appropriate consent when analyzing communications
  • Be transparent about the use of AI detection systems
  • Consider the impact of false positives on legitimate communications

Contact

For questions or concerns about this model, please contact SafeCircleIA or open an issue in the project repository.

Citation

@misc{heaven1-base-2025,
  author = {SafeCircleIA},
  title = {Heaven1-base-1b: Guardian - Predatory Behavior Detection Model},
  year = {2025},
  publisher = {Hugging Face},
  howpublished = {\url{https://huggingface.co/safecircleai/heaven1-base}}
}

Training procedure

The following bitsandbytes quantization config was used during training:

  • quant_method: QuantizationMethod.BITS_AND_BYTES
  • _load_in_8bit: False
  • _load_in_4bit: True
  • llm_int8_threshold: 6.0
  • llm_int8_skip_modules: None
  • llm_int8_enable_fp32_cpu_offload: False
  • llm_int8_has_fp16_weight: False
  • bnb_4bit_quant_type: nf4
  • bnb_4bit_use_double_quant: True
  • bnb_4bit_compute_dtype: float16
  • bnb_4bit_quant_storage: uint8
  • load_in_4bit: True
  • load_in_8bit: False

Framework versions

  • PEFT 0.6.0
Downloads last month
2
Inference Providers NEW
This model is not currently available via any of the supported Inference Providers.
The model cannot be deployed to the HF Inference API: The HF Inference API does not support text-classification models for peft library.

Model tree for safecircleai/heaven1-base

Adapter
(190)
this model