xuyingjie521 commited on
Commit
0db4d44
1 Parent(s): 4d84748
README.md CHANGED
@@ -1,3 +1,47 @@
1
- ---
2
- license: afl-3.0
3
- ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ## UIE(Universal Information Extraction)
2
+
3
+ ### Introduction
4
+
5
+ UIE(Universal Information Extraction) is an SOTA method in PaddleNLP, you can see details [here](https://github.com/PaddlePaddle/PaddleNLP/tree/develop/model_zoo/uie).
6
+ Paper is [here](https://arxiv.org/pdf/2203.12277.pdf)
7
+
8
+ ### Usage
9
+
10
+ I save the UIE model as a entire model(Ernie 3.0 backbone + start/end layers), so you need to load model as:
11
+
12
+ #### 1. clone this model to your local path
13
+
14
+ ```sh
15
+ git lfs install
16
+ git clone https://huggingface.co/xyj125/uie-base-chinese
17
+ ```
18
+
19
+ If you don't have [`git-lfs`], you can also:
20
+
21
+ * Download manually by click [`Files and versions`] at Top Of This Card.
22
+
23
+ #### 2. load this model from local
24
+
25
+ ```python
26
+ import os
27
+ import torch
28
+ from transformers import AutoTokenizer
29
+
30
+ uie_model = 'uie-base-zh'
31
+ model = torch.load(os.path.join(uie_model, 'pytorch_model.bin')) # load UIE model
32
+ tokenizer = AutoTokenizer.from_pretrained('uie-base') # load tokenizer
33
+ ...
34
+
35
+ start_prob, end_prob = model(input_ids=batch['input_ids'],
36
+ token_type_ids=batch['token_type_ids'],
37
+ attention_mask=batch['attention_mask']))
38
+ print(f'start_prob ({type(start_prob)}): {start_prob.size()}') # start_prob
39
+ print(f'end_prob ({type(end_prob)}): {end_prob.size()}') # end_prob
40
+ ...
41
+ ```
42
+
43
+ Here is the output of model (with batch_size=16, max_seq_len=256):
44
+ ```python
45
+ start_prob (<class 'torch.Tensor'>): torch.Size([16, 256])
46
+ end_prob (<class 'torch.Tensor'>): torch.Size([16, 256])
47
+ ```
pytorch_model.bin ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:1a3e9c51156faafe74bbc080d31d1c430708231a8c86c458924da13ccba488b7
3
+ size 471924589
special_tokens_map.json ADDED
@@ -0,0 +1,7 @@
 
 
 
 
 
 
 
 
1
+ {
2
+ "cls_token": "[CLS]",
3
+ "mask_token": "[MASK]",
4
+ "pad_token": "[PAD]",
5
+ "sep_token": "[SEP]",
6
+ "unk_token": "[UNK]"
7
+ }
tokenizer.json ADDED
The diff for this file is too large to render. See raw diff
 
tokenizer_config.json ADDED
@@ -0,0 +1,15 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "cls_token": "[CLS]",
3
+ "do_basic_tokenize": true,
4
+ "do_lower_case": true,
5
+ "mask_token": "[MASK]",
6
+ "name_or_path": "nghuyong/ernie-3.0-base-zh",
7
+ "never_split": null,
8
+ "pad_token": "[PAD]",
9
+ "sep_token": "[SEP]",
10
+ "special_tokens_map_file": null,
11
+ "strip_accents": null,
12
+ "tokenize_chinese_chars": true,
13
+ "tokenizer_class": "BertTokenizer",
14
+ "unk_token": "[UNK]"
15
+ }
vocab.txt ADDED
The diff for this file is too large to render. See raw diff