first

Browse files

Files changed (6) hide show

README.md +47 -3
pytorch_model.bin +3 -0
special_tokens_map.json +7 -0
tokenizer.json +0 -0
tokenizer_config.json +15 -0
vocab.txt +0 -0

README.md CHANGED Viewed

@@ -1,3 +1,47 @@
----
-license: afl-3.0
----

+## UIE(Universal Information Extraction)
+### Introduction
+UIE(Universal Information Extraction) is an SOTA method in PaddleNLP, you can see details [here](https://github.com/PaddlePaddle/PaddleNLP/tree/develop/model_zoo/uie).
+Paper is [here](https://arxiv.org/pdf/2203.12277.pdf)
+### Usage
+I save the UIE model as a entire model(Ernie 3.0 backbone + start/end layers), so you need to load model as:
+#### 1. clone this model to your local path
+```sh
+git lfs install
+git clone https://huggingface.co/xyj125/uie-base-chinese
+```
+If you don't have [`git-lfs`], you can also:
+  * Download manually by click [`Files and versions`] at Top Of This Card.
+#### 2. load this model from local
+```python
+import os
+import torch
+from transformers import AutoTokenizer
+uie_model = 'uie-base-zh'
+model = torch.load(os.path.join(uie_model, 'pytorch_model.bin'))        # load UIE model
+tokenizer = AutoTokenizer.from_pretrained('uie-base')                   # load tokenizer
+...
+start_prob, end_prob = model(input_ids=batch['input_ids'],
+                            token_type_ids=batch['token_type_ids'],
+                            attention_mask=batch['attention_mask']))
+print(f'start_prob ({type(start_prob)}): {start_prob.size()}')          # start_prob
+print(f'end_prob ({type(end_prob)}): {end_prob.size()}')                # end_prob
+...
+```
+Here is the output of model (with batch_size=16, max_seq_len=256):
+```python
+start_prob (<class 'torch.Tensor'>): torch.Size([16, 256])
+end_prob (<class 'torch.Tensor'>): torch.Size([16, 256])
+```

pytorch_model.bin ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:1a3e9c51156faafe74bbc080d31d1c430708231a8c86c458924da13ccba488b7
+size 471924589

special_tokens_map.json ADDED Viewed

	@@ -0,0 +1,7 @@

+{
+  "cls_token": "[CLS]",
+  "mask_token": "[MASK]",
+  "pad_token": "[PAD]",
+  "sep_token": "[SEP]",
+  "unk_token": "[UNK]"
+}

tokenizer.json ADDED Viewed

The diff for this file is too large to render. See raw diff

tokenizer_config.json ADDED Viewed

	@@ -0,0 +1,15 @@

+{
+  "cls_token": "[CLS]",
+  "do_basic_tokenize": true,
+  "do_lower_case": true,
+  "mask_token": "[MASK]",
+  "name_or_path": "nghuyong/ernie-3.0-base-zh",
+  "never_split": null,
+  "pad_token": "[PAD]",
+  "sep_token": "[SEP]",
+  "special_tokens_map_file": null,
+  "strip_accents": null,
+  "tokenize_chinese_chars": true,
+  "tokenizer_class": "BertTokenizer",
+  "unk_token": "[UNK]"
+}

vocab.txt ADDED Viewed

The diff for this file is too large to render. See raw diff