Cobalt-Safety-Guard

Cobalt-Safety-Guard

1. Introduction

Cobalt-Safety-Guard is a moderation-aware model post-trained with a heavy emphasis on safety and refusal calibration.

2. Evaluation Results

Comprehensive Benchmark Results

Benchmark GuardNet Cobalt-v1 SafeLLM Cobalt-Safety-Guard
Core Reasoning Tasks Math Reasoning 0.566 0.603 0.599 0.618
Logical Reasoning 0.828 0.822 0.824 0.849
Common Sense 0.756 0.717 0.754 0.776
Language Understanding Reading Comprehension 0.718 0.706 0.729 0.750
Question Answering 0.590 0.602 0.628 0.641
Text Classification 0.796 0.803 0.805 0.848
Sentiment Analysis 0.794 0.775 0.778 0.814
Generation Tasks Code Generation 0.705 0.696 0.710 0.717
Creative Writing 0.651 0.653 0.638 0.684
Dialogue Generation 0.641 0.684 0.651 0.692
Summarization 0.758 0.751 0.789 0.799
Specialized Capabilities Translation 0.809 0.779 0.762 0.822
Knowledge Retrieval 0.660 0.664 0.667 0.711
Instruction Following 0.785 0.772 0.739 0.791
Safety Evaluation 0.714 0.743 0.749 0.772

Overall Performance Summary

The Cobalt-Safety-Guard demonstrates strong performance across all evaluated benchmark categories, with particularly notable results in reasoning and generation tasks.

3. Chat Website & API Platform

We offer a chat interface and API for you to interact with Cobalt-Safety-Guard. Please check our official website for more details.

4. How to Run Locally

Please refer to our code repository for more information about running Cobalt-Safety-Guard locally.

Temperature

We recommend setting the temperature parameter to 0.6.

5. License

This repository is released under the apache-2.0 license. The model supports commercial use.

6. Contact

If you have any questions, please contact us at safety@cobalt-trust.org.

Downloads last month
51
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support