File size: 1,954 Bytes
afe1a8c
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
8615ed5
cdb1528
8615ed5
 
 
 
 
 
 
 
 
 
 
afe1a8c
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
---
language:
- zh
metrics:
- accuracy
- recall
- precision
library_name: transformers
pipeline_tag: text-classification
---
# Flames-scorer

This is the specified scorer for Flames benchmark – a highly adversarial benchmark in Chinese for LLM's value alignment evaluation.
For more detail, please refer to our [paper](https://arxiv.org/abs/2311.06899) and [Github repo](https://github.com/AIFlames/Flames/tree/main)

## Model Details
* Developed by: Shanghai AI Lab and Fudan NLP Group.
* Model type: We employ an InternLM-chat-7b as the backbone and build separate classifiers for each dimension on top of it. Then, we apply a multi-task training approach to train the scorer.
* Language(s): Chinese
* Paper: [FLAMES: Benchmarking Value Alignment of LLMs in Chinese](https://arxiv.org/abs/2311.06899)
* Contact: For questions and comments about the model, please email tengyan@pjlab.org.cn.

## Usage

The environment can be set up as:
```shell
$ pip install -r requirements.txt
```
And you can use `infer.py` to evaluate your model:
```shell
python infer.py --data_path YOUR_DATA_FILE.jsonl
```

The flames-scorer can be loaded by:
```python
from tokenization_internlm import InternLMTokenizer
from modeling_internlm import InternLMForSequenceClassification

tokenizer = InternLMTokenizer.from_pretrained("CaasiHUANG/flames-scorer", trust_remote_code=True)
model = InternLMForSequenceClassification.from_pretrained("CaasiHUANG/flames-scorer", trust_remote_code=True)

```



Please note that:
1. Ensure each entry in `YOUR_DATA_FILE.jsonl` includes the fields: "dimension", "prompt", and "response".
2. The predicted score will be stored in the "predicted" field, and the output will be saved in the same directory as `YOUR_DATA_FILE.jsonl`.
3. The accuracy of the Flames-scorer on out-of-distribution prompts (i.e., prompts not included in the Flames-prompts) has not been evaluated. Consequently, its predictions for such data may not be reliable.