add Dataset
Browse files- README.md +33 -3
- README_CN.md +31 -5
README.md
CHANGED
@@ -1,6 +1,18 @@
|
|
1 |
---
|
2 |
license: apache-2.0
|
3 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
4 |
# 1.Differences from knowlm-13b-zhixi
|
5 |
Compared to zjunlp/knowlm-13b-zhixi, zjunlp/knowlm-13b-ie exhibits slightly stronger practicality in information extraction but with a decrease in its general applicability.
|
6 |
|
@@ -94,7 +106,25 @@ python kg2instruction/convert_test.py \
|
|
94 |
```
|
95 |
|
96 |
|
97 |
-
# 5.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
98 |
We provide a script, [inference.py](https://github.com/zjunlp/DeepKE/blob/main/example/llm/InstructKGC/src/inference.py), for direct inference using the `zjunlp/knowlm-13b-ie model`. Please refer to the [README.md](https://github.com/zjunlp/DeepKE/blob/main/example/llm/InstructKGC/README.md) for environment configuration and other details.
|
99 |
|
100 |
```bash
|
@@ -110,7 +140,7 @@ If GPU memory is not enough, you can use `--bits 8` or `--bits 4`.
|
|
110 |
|
111 |
|
112 |
|
113 |
-
#
|
114 |
|
115 |
We provide a script at [evaluate.py](https://github.com/zjunlp/DeepKE/blob/main/example/llm/InstructKGC/kg2instruction/evaluate.py) to convert the string output of the model into a list and calculate F1
|
116 |
|
|
|
1 |
---
|
2 |
license: apache-2.0
|
3 |
+
----
|
4 |
+
|
5 |
+
|
6 |
+
- [1.Differences from knowlm-13b-zhixi](#1differences-from-knowlm-13b-zhixi)
|
7 |
+
- [2.IE template](#2ie-template)
|
8 |
+
- [3.Common relationship types](#3common-relationship-types)
|
9 |
+
- [4.Convert script](#4convert-script)
|
10 |
+
- [5.Datasets](#5datasets)
|
11 |
+
- [6.Usage](#6usage)
|
12 |
+
- [7.Evaluate](#7evaluate)
|
13 |
+
|
14 |
+
|
15 |
+
|
16 |
# 1.Differences from knowlm-13b-zhixi
|
17 |
Compared to zjunlp/knowlm-13b-zhixi, zjunlp/knowlm-13b-ie exhibits slightly stronger practicality in information extraction but with a decrease in its general applicability.
|
18 |
|
|
|
106 |
```
|
107 |
|
108 |
|
109 |
+
# 5.Datasets
|
110 |
+
|
111 |
+
|
112 |
+
Below are some readily processed datasets:
|
113 |
+
|
114 |
+
| Name | Download Links | Quantity | Description |
|
115 |
+
| ------------------- | ---------------------------------------------------------------------------------------------------------------------- | ------ | -------------------------------------------------------------------------------------------------------------------------------------------------------- |
|
116 |
+
| KnowLM-IE.json | [Google Drive](https://drive.google.com/file/d/1hY_R6aFgW4Ga7zo41VpOVOShbTgBqBbL/view?usp=sharing) <br/> [HuggingFace](https://huggingface.co/datasets/zjunlp/KnowLM-IE) | 281860 | Dataset mentioned in [InstructIE](https://arxiv.org/abs/2305.11527) |
|
117 |
+
| KnowLM-ke | [HuggingFace](hhttps://huggingface.co/datasets/zjunlp/knowlm-ke) | XXXX | Contains all instruction data (General, IE, Code, COT, etc.) used for training [zjunlp/knowlm-13b-zhixi](https://huggingface.co/zjunlp/knowlm-13b-zhixi) |
|
118 |
+
|
119 |
+
|
120 |
+
`KnowLM-IE.json`: Contains fields such as `'id'` (unique identifier), `'cate'` (text category), `'instruction'` (extraction instruction), `'input'` (input text), `'output'` (output text), and `'relation'` (triples). The `'relation'` field can be used to construct extraction instructions and outputs freely. `'instruction'` has 16 formats (4 prompts * 4 output formats), and `'output'` is generated in the specified format from `'instruction'`.
|
121 |
+
|
122 |
+
`KnowLM-ke`: Contains fields `'instruction'`, `'input'`, and `'output'` only. The files `ee-en.json`, `ee_train.json`, `ner-en.json`, `ner_train.json`, `re-en.json`, and `re_train.json` under its directory contain Chinese-English IE instruction data.
|
123 |
+
|
124 |
+
|
125 |
+
|
126 |
+
|
127 |
+
# 6.Usage
|
128 |
We provide a script, [inference.py](https://github.com/zjunlp/DeepKE/blob/main/example/llm/InstructKGC/src/inference.py), for direct inference using the `zjunlp/knowlm-13b-ie model`. Please refer to the [README.md](https://github.com/zjunlp/DeepKE/blob/main/example/llm/InstructKGC/README.md) for environment configuration and other details.
|
129 |
|
130 |
```bash
|
|
|
140 |
|
141 |
|
142 |
|
143 |
+
# 7.Evaluate
|
144 |
|
145 |
We provide a script at [evaluate.py](https://github.com/zjunlp/DeepKE/blob/main/example/llm/InstructKGC/kg2instruction/evaluate.py) to convert the string output of the model into a list and calculate F1
|
146 |
|
README_CN.md
CHANGED
@@ -1,4 +1,13 @@
|
|
1 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
2 |
|
3 |
与 zjunlp/knowlm-13b-zhixi 相比,zjunlp/knowlm-13b-ie 在信息抽取方面表现出略强的实用性,但其一般适用性下降。
|
4 |
|
@@ -6,7 +15,7 @@ zjunlp/knowlm-13b-ie 从中英文信息抽取数据集中采样约 10% 的数据
|
|
6 |
|
7 |
|
8 |
|
9 |
-
# 2
|
10 |
关系抽取(RE)支持以下模板:
|
11 |
|
12 |
```python
|
@@ -63,7 +72,7 @@ relation_int_out_format_en = {
|
|
63 |
|
64 |
此处 [schema](https://github.com/zjunlp/DeepKE/blob/main/example/llm/InstructKGC/kg2instruction/schema.py) 提供了12种文本主题, 以及该主题下常见的关系类型。
|
65 |
|
66 |
-
# 4
|
67 |
|
68 |
提供一个名为 [convert.py](https://github.com/zjunlp/DeepKE/blob/main/example/llm/InstructKGC/kg2instruction/convert.py)、[convert_test.py](https://github.com/zjunlp/DeepKE/blob/main/example/llm/InstructKGC/kg2instruction/convert_test.py) 的脚本,用于将数据统一转换为可以直接输入 KnowLM 的指令。在执行 convert.py 之前,请参考 [data](https://github.com/zjunlp/DeepKE/tree/main/example/llm/InstructKGC/data) 目录中包含了每个任务的预期数据格式。
|
69 |
|
@@ -91,7 +100,24 @@ python kg2instruction/convert_test.py \
|
|
91 |
```
|
92 |
|
93 |
|
94 |
-
# 5
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
95 |
我们提供了可直接使用 `zjunlp/knowlm-13b-ie` 模型进行推理的脚本[inference.py](https://github.com/zjunlp/DeepKE/blob/main/example/llm/InstructKGC/src/inference.py), 请参考 [README.md](https://github.com/zjunlp/DeepKE/blob/main/example/llm/InstructKGC/README.md) 配置环境等。
|
96 |
|
97 |
```bash
|
@@ -106,7 +132,7 @@ CUDA_VISIBLE_DEVICES="0" python src/inference.py \
|
|
106 |
如果GPU显存不足够, 可以采用 `--bits 8` 或 `--bits 4`
|
107 |
|
108 |
|
109 |
-
#
|
110 |
我们提供一个位于 [evaluate.py](https://github.com/zjunlp/DeepKE/blob/main/example/llm/InstructKGC/kg2instruction/evaluate.py) 的脚本,用于将模型的字符串输出转换为列表并计算 F1 分数。
|
111 |
|
112 |
```bash
|
|
|
1 |
+
- [1.与 knowlm-13b-zhixi 的区别](#1与-knowlm-13b-zhixi-的区别)
|
2 |
+
- [2.信息抽取模板](#2信息抽取模板)
|
3 |
+
- [3.常见的关系类型](#3常见的关系类型)
|
4 |
+
- [4.转换脚本](#4转换脚本)
|
5 |
+
- [5.现成数据集](#5现成数据集)
|
6 |
+
- [6.使用](#6使用)
|
7 |
+
- [7.评估](#7评估)
|
8 |
+
|
9 |
+
|
10 |
+
# 1.与 knowlm-13b-zhixi 的区别
|
11 |
|
12 |
与 zjunlp/knowlm-13b-zhixi 相比,zjunlp/knowlm-13b-ie 在信息抽取方面表现出略强的实用性,但其一般适用性下降。
|
13 |
|
|
|
15 |
|
16 |
|
17 |
|
18 |
+
# 2.信息抽取模板
|
19 |
关系抽取(RE)支持以下模板:
|
20 |
|
21 |
```python
|
|
|
72 |
|
73 |
此处 [schema](https://github.com/zjunlp/DeepKE/blob/main/example/llm/InstructKGC/kg2instruction/schema.py) 提供了12种文本主题, 以及该主题下常见的关系类型。
|
74 |
|
75 |
+
# 4.转换脚本
|
76 |
|
77 |
提供一个名为 [convert.py](https://github.com/zjunlp/DeepKE/blob/main/example/llm/InstructKGC/kg2instruction/convert.py)、[convert_test.py](https://github.com/zjunlp/DeepKE/blob/main/example/llm/InstructKGC/kg2instruction/convert_test.py) 的脚本,用于将数据统一转换为可以直接输入 KnowLM 的指令。在执行 convert.py 之前,请参考 [data](https://github.com/zjunlp/DeepKE/tree/main/example/llm/InstructKGC/data) 目录中包含了每个任务的预期数据格式。
|
78 |
|
|
|
100 |
```
|
101 |
|
102 |
|
103 |
+
# 5.现成数据集
|
104 |
+
|
105 |
+
下面是一些现成的处理后的数据:
|
106 |
+
|
107 |
+
| 名称 | 下载 | 数量 | 描述 |
|
108 |
+
| ------------------- | ---------------------------------------------------------------------------------------------------------------------- | ------ | -------------------------------------------------------------------------------------------------------------------------------------------------------- |
|
109 |
+
| KnowLM-IE.json | [Google drive](https://drive.google.com/file/d/1hY_R6aFgW4Ga7zo41VpOVOShbTgBqBbL/view?usp=sharing) <br/> [HuggingFace](https://huggingface.co/datasets/zjunlp/KnowLM-IE) | 281860 | [InstructIE](https://arxiv.org/abs/2305.11527) 中提到的数据集 |
|
110 |
+
| KnowLM-ke | [HuggingFace](hhttps://huggingface.co/datasets/zjunlp/knowlm-ke) | XXXX | 训练[zjunlp/knowlm-13b-zhixi](https://huggingface.co/zjunlp/knowlm-13b-zhixi)所用到的所有指令数据(通用、IE、Code、COT等) |
|
111 |
+
|
112 |
+
|
113 |
+
`KnowLM-IE.json`:包含 `'id'`(唯一标识符)、`'cate'`(文本主题)、`'instruction'`(抽取指令)、`'input'`(输入文本)、`'output'`(输出文本)字段、`'relation'`(三元组)字段,可以通过`'relation'`自由构建抽取的指令和输出,`'instruction'`有16种格式(4种prompt * 4种输出格式),`'output'`是按照`'instruction'`中指定的输出格式生成的文本。
|
114 |
+
|
115 |
+
|
116 |
+
`KnowLM-ke`:仅包含`'instruction'`、`'input'`、`'output'`字段。其目录下的`ee-en.json`、`ee_train.json`、`ner-en.json`、`ner_train.json`、`re-en.json`、`re_train.json`为中英文IE指令数据。
|
117 |
+
|
118 |
+
|
119 |
+
|
120 |
+
# 6.使用
|
121 |
我们提供了可直接使用 `zjunlp/knowlm-13b-ie` 模型进行推理的脚本[inference.py](https://github.com/zjunlp/DeepKE/blob/main/example/llm/InstructKGC/src/inference.py), 请参考 [README.md](https://github.com/zjunlp/DeepKE/blob/main/example/llm/InstructKGC/README.md) 配置环境等。
|
122 |
|
123 |
```bash
|
|
|
132 |
如果GPU显存不足够, 可以采用 `--bits 8` 或 `--bits 4`
|
133 |
|
134 |
|
135 |
+
# 7.评估
|
136 |
我们提供一个位于 [evaluate.py](https://github.com/zjunlp/DeepKE/blob/main/example/llm/InstructKGC/kg2instruction/evaluate.py) 的脚本,用于将模型的字符串输出转换为列表并计算 F1 分数。
|
137 |
|
138 |
```bash
|