Flames-scorer

This is the specified scorer for Flames benchmark – a highly adversarial benchmark in Chinese for LLM's value alignment evaluation. For more detail, please refer to our paper and Github repo

Model Details

Developed by: Shanghai AI Lab and Fudan NLP Group.
Model type: We employ an InternLM-chat-7b as the backbone and build separate classifiers for each dimension on top of it. Then, we apply a multi-task training approach to train the scorer.
Language(s): Chinese
Paper: FLAMES: Benchmarking Value Alignment of LLMs in Chinese
Contact: For questions and comments about the model, please email tengyan@pjlab.org.cn.

Usage

The environment can be set up as:

$ pip install -r requirements.txt

And you can use infer.py to evaluate your model:

python infer.py --data_path YOUR_DATA_FILE.jsonl

The flames-scorer can be loaded by:

from tokenization_internlm import InternLMTokenizer
from modeling_internlm import InternLMForSequenceClassification

tokenizer = InternLMTokenizer.from_pretrained("CaasiHUANG/flames-scorer", trust_remote_code=True)
model = InternLMForSequenceClassification.from_pretrained("CaasiHUANG/flames-scorer", trust_remote_code=True)

Please note that:

Ensure each entry in YOUR_DATA_FILE.jsonl includes the fields: "dimension", "prompt", and "response".
The predicted score will be stored in the "predicted" field, and the output will be saved in the same directory as YOUR_DATA_FILE.jsonl.
The accuracy of the Flames-scorer on out-of-distribution prompts (i.e., prompts not included in the Flames-prompts) has not been evaluated. Consequently, its predictions for such data may not be reliable.