metadata
license: apache-2.0
language:
- zh
metrics:
- accuracy
- recall
- precision
library_name: transformers
pipeline_tag: text-classification
Flames-scorer
This is the specified scorer for Flames benchmark – a highly adversarial benchmark in Chinese for LLM's value alignment evaluation. For more detail, please refer to our paper and Github repo
Model Details
- Developed by: Shanghai AI Lab and Fudan NLP Group.
- Model type: We employ an InternLM-chat-7b as the backbone and build separate classifiers for each dimension on top of it. Then, we apply a multi-task training approach to train the scorer.
- Language(s): Chinese
- Paper: FLAMES: Benchmarking Value Alignment of LLMs in Chinese
- Contact: For questions and comments about the model, please email tengyan@pjlab.org.cn.
Usage
The environment can be set up as:
$ pip install -r requirements.txt
And you can use infer.py
to evaluate your model:
python infer.py --data_path YOUR_DATA_FILE.jsonl
Please note that:
- Ensure each entry in
YOUR_DATA_FILE.jsonl
includes the fields: "dimension", "prompt", and "response". - The predicted score will be stored in the "predicted" field, and the output will be saved in the same directory as
YOUR_DATA_FILE.jsonl
. - The accuracy of the Flames-scorer on out-of-distribution prompts (i.e., prompts not included in the Flames-prompts) has not been evaluated. Consequently, its predictions for such data may not be reliable.