Heaven1-base-1b: Guardian - Predatory Behavior Detection Model

Model Description

Heaven1-base-1b (codename: "Guardian") is a fine-tuned version of Meta's Llama-3.2-1B-Instruct model, specifically optimized to detect and prevent harmful predatory patterns in conversations. This model was created using Parameter-Efficient Fine-Tuning (PEFT) with QLoRA techniques to enable training on consumer-grade hardware.

Model Details

Developed by: SafeCircleIA
Base model: Meta-Llama-3.2-1B-Instruct
Model type: Causal Language Model with LoRA adapters
Language: English
Training method: QLoRA fine-tuning (4-bit quantization)
License: MIT (subject to Llama 3.2 usage restrictions)

Uses

Direct Use

This model is designed for direct use in:

Detecting potentially harmful interactions in text messages
Classifying messages as predatory or safe with brief explanations
Assisting human moderators in identifying concerning patterns
Supporting research on digital safety

Out-of-Scope Use

This model should not be used for:

Making autonomous decisions about user safety without human review
Creating or refining predatory language patterns
As the sole determinant for any safety-critical applications
Any application without proper privacy considerations and consent

Bias, Risks, and Limitations

The model detects patterns based on its training data and may miss novel predatory tactics
Performance may vary across different cultural contexts and communication styles
False positives and false negatives are possible
Relies heavily on conversational patterns identified during training
Limited to English language text

Recommendations

Always combine with human review for best results
Consider cultural and contextual factors when interpreting results
Regularly evaluate the model's performance in your specific use case
Use low temperature settings (0.1-0.3) for more consistent classification results

How to Get Started with the Model

To run inference with this model:

python run_inference.py --use_4bit --model_path ./heaven1-base-1b --base_model meta-llama/Llama-3.2-1B-Instruct

Optional Parameters

--max_length (default: 512): Maximum sequence length
--temperature (default: 0.1): Controls randomness (lower = more deterministic classification)

Training Details

Training Data

The model was fine-tuned on a custom dataset of 10,000 examples, with approximately 50% containing examples of predatory behavior patterns. This balanced dataset ensures the model can effectively identify concerning patterns while maintaining normal conversation capabilities.

Training Hyperparameters

This model was trained with the following hyperparameters:

Learning rate: 2e-5
Epochs: 3
Batch size: 1
Gradient accumulation steps: 16
LoRA rank (r): 8
LoRA alpha: 16
LoRA dropout: 0.05
4-bit quantization: Yes (NF4 format)
Max sequence length: 2048

Evaluation

Testing Data & Metrics

The model was evaluated on a held-out test set (10% of the dataset) with the following metrics:

Accuracy: Measures overall classification correctness
Precision: Measures how many identified predatory messages were actually predatory
Recall: Measures how many actual predatory messages were identified
F1 Score: Harmonic mean of precision and recall

Results

Evaluation metrics on test dataset:

Metric	Score
Accuracy	93.8%
Precision	92.4%
Recall	95.1%
F1	93.7%

Environmental Impact

Hardware Type: Consumer GPU (NVIDIA RTX 2060, 6GB VRAM)
Hours used: Approximately 3 hours for training
Energy consumption: Minimal due to efficient QLoRA fine-tuning

Performance and Limitations

Hardware requirements: Can run on consumer GPUs with at least 6GB VRAM when used with 4-bit quantization
Sequence length: Optimized for sequences up to 2048 tokens
Limitations:
- As with any AI model, it may occasionally miss subtle predatory patterns
- False positives are possible in ambiguous situations
- Performance depends on input context quality

Ethical Considerations

This model is designed to help identify and prevent potentially harmful predatory patterns in conversations. However, it should not be used as the sole determinant for making important decisions. Human oversight is essential when deploying this model in real-world applications.

Respect privacy and obtain appropriate consent when analyzing communications
Be transparent about the use of AI detection systems
Consider the impact of false positives on legitimate communications

Contact

For questions or concerns about this model, please contact SafeCircleIA or open an issue in the project repository.

Citation

@misc{heaven1-base-2025,
  author = {SafeCircleIA},
  title = {Heaven1-base-1b: Guardian - Predatory Behavior Detection Model},
  year = {2025},
  publisher = {Hugging Face},
  howpublished = {\url{https://huggingface.co/safecircleai/heaven1-base}}
}

Training procedure

The following bitsandbytes quantization config was used during training:

quant_method: QuantizationMethod.BITS_AND_BYTES
_load_in_8bit: False
_load_in_4bit: True
llm_int8_threshold: 6.0
llm_int8_skip_modules: None
llm_int8_enable_fp32_cpu_offload: False
llm_int8_has_fp16_weight: False
bnb_4bit_quant_type: nf4
bnb_4bit_use_double_quant: True
bnb_4bit_compute_dtype: float16
bnb_4bit_quant_storage: uint8
load_in_4bit: True
load_in_8bit: False

Framework versions

PEFT 0.6.0

safecircleai
/

heaven1-base