CaasiHUANG
commited on
Commit
•
afe1a8c
1
Parent(s):
313f291
Create README.md
Browse files
README.md
ADDED
@@ -0,0 +1,37 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
---
|
2 |
+
license: apache-2.0
|
3 |
+
language:
|
4 |
+
- zh
|
5 |
+
metrics:
|
6 |
+
- accuracy
|
7 |
+
- recall
|
8 |
+
- precision
|
9 |
+
library_name: transformers
|
10 |
+
pipeline_tag: text-classification
|
11 |
+
---
|
12 |
+
# Flames-scorer
|
13 |
+
|
14 |
+
This is the specified scorer for Flames benchmark – a highly adversarial benchmark in Chinese for LLM's value alignment evaluation.
|
15 |
+
For more detail, please refer to our [paper](https://arxiv.org/abs/2311.06899) and [Github repo](https://github.com/AIFlames/Flames/tree/main)
|
16 |
+
|
17 |
+
## Model Details
|
18 |
+
* Developed by: Shanghai AI Lab and Fudan NLP Group.
|
19 |
+
* Model type: We employ an InternLM-chat-7b as the backbone and build separate classifiers for each dimension on top of it. Then, we apply a multi-task training approach to train the scorer.
|
20 |
+
* Language(s): Chinese
|
21 |
+
* Paper: [FLAMES: Benchmarking Value Alignment of LLMs in Chinese](https://arxiv.org/abs/2311.06899)
|
22 |
+
* Contact: For questions and comments about the model, please email tengyan@pjlab.org.cn.
|
23 |
+
|
24 |
+
## Usage
|
25 |
+
|
26 |
+
The environment can be set up as:
|
27 |
+
```shell
|
28 |
+
$ pip install -r requirements.txt
|
29 |
+
```
|
30 |
+
And you can use `infer.py` to evaluate your model:
|
31 |
+
```shell
|
32 |
+
python infer.py --data_path YOUR_DATA_FILE.jsonl
|
33 |
+
```
|
34 |
+
Please note that:
|
35 |
+
1. Ensure each entry in `YOUR_DATA_FILE.jsonl` includes the fields: "dimension", "prompt", and "response".
|
36 |
+
2. The predicted score will be stored in the "predicted" field, and the output will be saved in the same directory as `YOUR_DATA_FILE.jsonl`.
|
37 |
+
3. The accuracy of the Flames-scorer on out-of-distribution prompts (i.e., prompts not included in the Flames-prompts) has not been evaluated. Consequently, its predictions for such data may not be reliable.
|