Cobalt-Safety-Guard

1. Introduction

Cobalt-Safety-Guard is a moderation-aware model post-trained with a heavy emphasis on safety and refusal calibration.

2. Evaluation Results

Comprehensive Benchmark Results

	Benchmark	GuardNet	Cobalt-v1	SafeLLM	Cobalt-Safety-Guard
Core Reasoning Tasks	Math Reasoning	0.566	0.603	0.599	0.618
	Logical Reasoning	0.828	0.822	0.824	0.849
	Common Sense	0.756	0.717	0.754	0.776
Language Understanding	Reading Comprehension	0.718	0.706	0.729	0.750
	Question Answering	0.590	0.602	0.628	0.641
	Text Classification	0.796	0.803	0.805	0.848
	Sentiment Analysis	0.794	0.775	0.778	0.814
Generation Tasks	Code Generation	0.705	0.696	0.710	0.717
	Creative Writing	0.651	0.653	0.638	0.684
	Dialogue Generation	0.641	0.684	0.651	0.692
	Summarization	0.758	0.751	0.789	0.799
Specialized Capabilities	Translation	0.809	0.779	0.762	0.822
	Knowledge Retrieval	0.660	0.664	0.667	0.711
	Instruction Following	0.785	0.772	0.739	0.791
	Safety Evaluation	0.714	0.743	0.749	0.772

Overall Performance Summary

The Cobalt-Safety-Guard demonstrates strong performance across all evaluated benchmark categories, with particularly notable results in reasoning and generation tasks.

3. Chat Website & API Platform

We offer a chat interface and API for you to interact with Cobalt-Safety-Guard. Please check our official website for more details.

4. How to Run Locally

Please refer to our code repository for more information about running Cobalt-Safety-Guard locally.

Temperature

We recommend setting the temperature parameter to 0.6.

5. License

This repository is released under the apache-2.0 license. The model supports commercial use.

6. Contact

If you have any questions, please contact us at safety@cobalt-trust.org.

Downloads last month: 51