zjunlp
/

knowlm-13b-ie

@@ -1,6 +1,18 @@
 ---
 license: apache-2.0
----
 # 1.Differences from knowlm-13b-zhixi
 Compared to zjunlp/knowlm-13b-zhixi, zjunlp/knowlm-13b-ie exhibits slightly stronger practicality in information extraction but with a decrease in its general applicability.
@@ -94,7 +106,25 @@ python kg2instruction/convert_test.py \
 ```
-# 5.Usage
 We provide a script, [inference.py](https://github.com/zjunlp/DeepKE/blob/main/example/llm/InstructKGC/src/inference.py), for direct inference using the `zjunlp/knowlm-13b-ie model`. Please refer to the [README.md](https://github.com/zjunlp/DeepKE/blob/main/example/llm/InstructKGC/README.md) for environment configuration and other details.
 ```bash
@@ -110,7 +140,7 @@ If GPU memory is not enough, you can use `--bits 8` or `--bits 4`.
-# 6.Evaluate
 We provide a script at [evaluate.py](https://github.com/zjunlp/DeepKE/blob/main/example/llm/InstructKGC/kg2instruction/evaluate.py) to convert the string output of the model into a list and calculate F1

 ---
 license: apache-2.0
+----
+- [1.Differences from knowlm-13b-zhixi](#1differences-from-knowlm-13b-zhixi)
+- [2.IE template](#2ie-template)
+- [3.Common relationship types](#3common-relationship-types)
+- [4.Convert script](#4convert-script)
+- [5.Datasets](#5datasets)
+- [6.Usage](#6usage)
+- [7.Evaluate](#7evaluate)
 # 1.Differences from knowlm-13b-zhixi
 Compared to zjunlp/knowlm-13b-zhixi, zjunlp/knowlm-13b-ie exhibits slightly stronger practicality in information extraction but with a decrease in its general applicability.
 ```
+# 5.Datasets
+Below are some readily processed datasets:
+| Name                | Download Links                                                                                                           | Quantity | Description                                                                                                                                         |
+| ------------------- | ---------------------------------------------------------------------------------------------------------------------- | ------ | -------------------------------------------------------------------------------------------------------------------------------------------------------- |
+| KnowLM-IE.json       | [Google Drive](https://drive.google.com/file/d/1hY_R6aFgW4Ga7zo41VpOVOShbTgBqBbL/view?usp=sharing) <br/> [HuggingFace](https://huggingface.co/datasets/zjunlp/KnowLM-IE)      | 281860 | Dataset mentioned in [InstructIE](https://arxiv.org/abs/2305.11527)                                                                                 |
+| KnowLM-ke         | [HuggingFace](hhttps://huggingface.co/datasets/zjunlp/knowlm-ke)                     | XXXX   | Contains all instruction data (General, IE, Code, COT, etc.) used for training [zjunlp/knowlm-13b-zhixi](https://huggingface.co/zjunlp/knowlm-13b-zhixi) |
+`KnowLM-IE.json`: Contains fields such as `'id'` (unique identifier), `'cate'` (text category), `'instruction'` (extraction instruction), `'input'` (input text), `'output'` (output text), and `'relation'` (triples). The `'relation'` field can be used to construct extraction instructions and outputs freely. `'instruction'` has 16 formats (4 prompts * 4 output formats), and `'output'` is generated in the specified format from `'instruction'`.
+`KnowLM-ke`: Contains fields `'instruction'`, `'input'`, and `'output'` only. The files `ee-en.json`, `ee_train.json`, `ner-en.json`, `ner_train.json`, `re-en.json`, and `re_train.json` under its directory contain Chinese-English IE instruction data.
+# 6.Usage
 We provide a script, [inference.py](https://github.com/zjunlp/DeepKE/blob/main/example/llm/InstructKGC/src/inference.py), for direct inference using the `zjunlp/knowlm-13b-ie model`. Please refer to the [README.md](https://github.com/zjunlp/DeepKE/blob/main/example/llm/InstructKGC/README.md) for environment configuration and other details.
 ```bash
+# 7.Evaluate
 We provide a script at [evaluate.py](https://github.com/zjunlp/DeepKE/blob/main/example/llm/InstructKGC/kg2instruction/evaluate.py) to convert the string output of the model into a list and calculate F1

README_CN.md CHANGED Viewed

@@ -1,4 +1,13 @@
-# 1. 与 knowlm-13b-zhixi 的区别
 与 zjunlp/knowlm-13b-zhixi 相比，zjunlp/knowlm-13b-ie 在信息抽取方面表现出略强的实用性，但其一般适用性下降。
@@ -6,7 +15,7 @@ zjunlp/knowlm-13b-ie 从中英文信息抽取数据集中采样约 10% 的数据
-# 2. 信息抽取模板
 关系抽取（RE）支持以下模板：
 ```python
@@ -63,7 +72,7 @@ relation_int_out_format_en = {
 此处 [schema](https://github.com/zjunlp/DeepKE/blob/main/example/llm/InstructKGC/kg2instruction/schema.py) 提供了12种文本主题, 以及该主题下常见的关系类型。
-# 4. 转换脚本
 提供一个名为 [convert.py](https://github.com/zjunlp/DeepKE/blob/main/example/llm/InstructKGC/kg2instruction/convert.py)、[convert_test.py](https://github.com/zjunlp/DeepKE/blob/main/example/llm/InstructKGC/kg2instruction/convert_test.py) 的脚本，用于将数据统一转换为可以直接输入 KnowLM 的指令。在执行 convert.py 之前，请参考 [data](https://github.com/zjunlp/DeepKE/tree/main/example/llm/InstructKGC/data) 目录中包含了每个任务的预期数据格式。
@@ -91,7 +100,24 @@ python kg2instruction/convert_test.py \
 ```
-# 5. 使用
 我们提供了可直接使用 `zjunlp/knowlm-13b-ie` 模型进行推理的脚本[inference.py](https://github.com/zjunlp/DeepKE/blob/main/example/llm/InstructKGC/src/inference.py), 请参考 [README.md](https://github.com/zjunlp/DeepKE/blob/main/example/llm/InstructKGC/README.md) 配置环境等。
 ```bash
@@ -106,7 +132,7 @@ CUDA_VISIBLE_DEVICES="0" python src/inference.py \
 如果GPU显存不足够, 可以采用 `--bits 8` 或 `--bits 4`
-# 6. 评估
 我们提供一个位于 [evaluate.py](https://github.com/zjunlp/DeepKE/blob/main/example/llm/InstructKGC/kg2instruction/evaluate.py) 的脚本，用于将模型的字符串输出转换为列表并计算 F1 分数。
 ```bash

+- [1.与 knowlm-13b-zhixi 的区别](#1与-knowlm-13b-zhixi-的区别)
+- [2.信息抽取模板](#2信息抽取模板)
+- [3.常见的关系类型](#3常见的关系类型)
+- [4.转换脚本](#4转换脚本)
+- [5.现成数据集](#5现成数据集)
+- [6.使用](#6使用)
+- [7.评估](#7评估)
+# 1.与 knowlm-13b-zhixi 的区别
 与 zjunlp/knowlm-13b-zhixi 相比，zjunlp/knowlm-13b-ie 在信息抽取方面表现出略强的实用性，但其一般适用性下降。
+# 2.信息抽取模板
 关系抽取（RE）支持以下模板：
 ```python
 此处 [schema](https://github.com/zjunlp/DeepKE/blob/main/example/llm/InstructKGC/kg2instruction/schema.py) 提供了12种文本主题, 以及该主题下常见的关系类型。
+# 4.转换脚本
 提供一个名为 [convert.py](https://github.com/zjunlp/DeepKE/blob/main/example/llm/InstructKGC/kg2instruction/convert.py)、[convert_test.py](https://github.com/zjunlp/DeepKE/blob/main/example/llm/InstructKGC/kg2instruction/convert_test.py) 的脚本，用于将数据统一转换为可以直接输入 KnowLM 的指令。在执行 convert.py 之前，请参考 [data](https://github.com/zjunlp/DeepKE/tree/main/example/llm/InstructKGC/data) 目录中包含了每个任务的预期数据格式。
 ```
+# 5.现成数据集
+下面是一些现成的处理后的数据：
+| 名称                  | 下载                                                                                                                     | 数量     | 描述                                                                                                                                                       |
+| ------------------- | ---------------------------------------------------------------------------------------------------------------------- | ------ | -------------------------------------------------------------------------------------------------------------------------------------------------------- |
+| KnowLM-IE.json       | [Google drive](https://drive.google.com/file/d/1hY_R6aFgW4Ga7zo41VpOVOShbTgBqBbL/view?usp=sharing) <br/> [HuggingFace](https://huggingface.co/datasets/zjunlp/KnowLM-IE)      | 281860 | [InstructIE](https://arxiv.org/abs/2305.11527) 中提到的数据集                                                                                     |
+| KnowLM-ke         | [HuggingFace](hhttps://huggingface.co/datasets/zjunlp/knowlm-ke)                     | XXXX   | 训练[zjunlp/knowlm-13b-zhixi](https://huggingface.co/zjunlp/knowlm-13b-zhixi)所用到的所有指令数据(通用、IE、Code、COT等) |
+`KnowLM-IE.json`：包含 `'id'`(唯一标识符)、`'cate'`(文本主题)、`'instruction'`(抽取指令)、`'input'`(输入文本)、`'output'`(输出文本)字段、`'relation'`(三元组)字段，可以通过`'relation'`自由构建抽取的指令和输出，`'instruction'`有16种格式(4种prompt * 4种输出格式)，`'output'`是按照`'instruction'`中指定的输出格式生成的文本。
+`KnowLM-ke`：仅包含`'instruction'`、`'input'`、`'output'`字段。其目录下的`ee-en.json`、`ee_train.json`、`ner-en.json`、`ner_train.json`、`re-en.json`、`re_train.json`为中英文IE指令数据。
+# 6.使用
 我们提供了可直接使用 `zjunlp/knowlm-13b-ie` 模型进行推理的脚本[inference.py](https://github.com/zjunlp/DeepKE/blob/main/example/llm/InstructKGC/src/inference.py), 请参考 [README.md](https://github.com/zjunlp/DeepKE/blob/main/example/llm/InstructKGC/README.md) 配置环境等。
 ```bash
 如果GPU显存不足够, 可以采用 `--bits 8` 或 `--bits 4`
+# 7.评估
 我们提供一个位于 [evaluate.py](https://github.com/zjunlp/DeepKE/blob/main/example/llm/InstructKGC/kg2instruction/evaluate.py) 的脚本，用于将模型的字符串输出转换为列表并计算 F1 分数。
 ```bash