File size: 2,674 Bytes
38fd19c
1563633
 
 
38fd19c
1563633
 
 
 
 
76129db
1563633
 
 
 
 
76129db
1563633
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
94209ab
 
1563633
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
---

license: cc-by-nc-4.0
language:
- zh
---


---

# RA-IT-NER-zh-7B

**Description**: The RA-IT-NER-zh-7B model is trained from Qwen1.5-7B using the proposed Retrieval Augmented Instruction Tuning (RA-IT) approach. This model can be used for Chinese Open NER with and without RAG. The training data is our constructed [Sky-NER ](https://huggingface.co/datasets/EmmaStrong/Sky-NER), an instruction tuning dataset for Chinese OpenNER. We follow the recipe of [UniversalNER](https://arxiv.org/abs/2308.03279) and use the large-scale [SkyPile Corpus](https://huggingface.co/datasets/Skywork/SkyPile-150B) to construct this dataset. The data was collected by prompting gpt-3.5-turbo-0125 to label entities from passages and provide entity tags. The data collection prompt is as follows:

<div style="background-color: #f6f8fa; padding: 20px; border-radius: 10px; border: 1px solid #e1e4e8; box-shadow: 0 2px 5px rgba(0,0,0,0.1);">
<strong>Instruction:</strong><br/>
给定一段文本,你的任务是抽取所有实体并识别它们的实体类别。输出应为以下JSON格式:[{"实体1": "实体1的类别"}, ...]。</div>

Check our [paper](https://arxiv.org/abs/2406.17305) for more information. Check our [github repo](https://github.com/Emma1066/Retrieval-Augmented-IT-OpenNER) about how to use the model.

## Inference
The template for inference instances is as follows:
<div style="background-color: #f6f8fa; padding: 20px; border-radius: 10px; border: 1px solid #e1e4e8; box-shadow: 0 2px 5px rgba(0,0,0,0.1);">
<strong>Prompting template:</strong><br/>
USER: 以下是一些命名实体识别的例子:<span style="color: #d73a49;">{Fill the NER examples here}</span><br/>
ASSISTANT: 我已读完这些例子。<br/>
USER: 文本:<span style="color: #d73a49;">{Fill the input text here}</span><br/>
ASSISTANT: 我已读完这段文本。<br/>
USER: 文本中属于"<span style="color: #d73a49;">{Fill the entity type here}</span> "的实体有哪些?<br/>
ASSISTANT: <span style="color: #0366d6;">(model's predictions in JSON format)</span><br/>
</div>

Note: 
* The model can conduct inference **with and without** NER examples. If you want to conduct inference without examples, just start from the third line in the above template by directly inputting "文本:{input text}" in the "USER" role.
* Inferences are based on one entity type at a time. For multiple entity types, create separate instances for each type.



## License

This model and its associated data are released under the [CC BY-NC 4.0](https://creativecommons.org/licenses/by-nc/4.0/) license. They are primarily used for research purposes.