Commit
•
ed64c0f
1
Parent(s):
f42ac62
Update README.md
Browse files
README.md
CHANGED
@@ -11,44 +11,73 @@ model-index:
|
|
11 |
<!-- This model card has been generated automatically according to the information the Trainer had access to. You
|
12 |
should probably proofread and complete it, then remove this comment. -->
|
13 |
|
14 |
-
#
|
15 |
-
|
16 |
-
This
|
17 |
-
|
18 |
-
|
19 |
-
|
20 |
-
|
21 |
-
|
22 |
-
|
23 |
-
|
24 |
-
|
25 |
-
|
26 |
-
|
27 |
-
|
28 |
-
|
29 |
-
|
30 |
-
|
31 |
-
|
32 |
-
|
33 |
-
|
34 |
-
|
35 |
-
|
36 |
-
|
37 |
-
|
38 |
-
|
39 |
-
|
40 |
-
|
41 |
-
|
42 |
-
|
43 |
-
|
44 |
-
|
45 |
-
|
46 |
-
|
47 |
-
|
48 |
-
|
49 |
-
|
50 |
-
|
51 |
-
|
52 |
-
|
53 |
-
|
54 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
11 |
<!-- This model card has been generated automatically according to the information the Trainer had access to. You
|
12 |
should probably proofread and complete it, then remove this comment. -->
|
13 |
|
14 |
+
# MetaIE
|
15 |
+
|
16 |
+
This is a multilingual meta-model distilled from ChatGPT-3.5-turbo for information extraction. This is an intermediate checkpoint that can be well-transferred to all kinds of downstream information extraction tasks. This model can also be tested by different label-to-span matching as shown in the following example:
|
17 |
+
|
18 |
+
Ten languages are supported:
|
19 |
+
- English
|
20 |
+
- Français
|
21 |
+
- Español
|
22 |
+
- Italiano
|
23 |
+
- Deutsch
|
24 |
+
- Polski
|
25 |
+
- Pусский
|
26 |
+
- 中文
|
27 |
+
- 日本語
|
28 |
+
- 한국어
|
29 |
+
|
30 |
+
```python
|
31 |
+
from transformers import AutoModelForTokenClassification, AutoTokenizer
|
32 |
+
import torch
|
33 |
+
|
34 |
+
device = torch.device("cuda:0")
|
35 |
+
path = f"KomeijiForce/xlm-roberta-large-metaie"
|
36 |
+
tokenizer = AutoTokenizer.from_pretrained(path)
|
37 |
+
tagger = AutoModelForTokenClassification.from_pretrained(path).to(device)
|
38 |
+
|
39 |
+
def find_sequences(lst):
|
40 |
+
sequences = []
|
41 |
+
i = 0
|
42 |
+
while i < len(lst):
|
43 |
+
if lst[i] == 0:
|
44 |
+
start = i
|
45 |
+
end = i
|
46 |
+
i += 1
|
47 |
+
while i < len(lst) and lst[i] == 1:
|
48 |
+
end = i
|
49 |
+
i += 1
|
50 |
+
sequences.append((start, end+1))
|
51 |
+
else:
|
52 |
+
i += 1
|
53 |
+
return sequences
|
54 |
+
|
55 |
+
examples = [
|
56 |
+
"Fire volleys at the command happens: The soldiers were expected to fire volleys at the command of officers, but in practice this happened only in the first minutes of the battle .",
|
57 |
+
"Historische Ereignisse: Siebenjährigen Krieg von 1756 bis 1763, war Preußen als fünfte Großmacht neben Frankreich, Großbritannien, Österreich und Russland in der europäischen Pentarchie anerkannt .",
|
58 |
+
"高度: 东方明珠自落成后便为上海天际线的组成部分之一,总高468米。",
|
59 |
+
"倒れた場所: カフカは高松の私立図書館に通うようになるが、ある日目覚めると、自分が森の中で血だらけで倒れていた。",
|
60 |
+
]
|
61 |
+
|
62 |
+
for example in examples:
|
63 |
+
inputs = tokenizer(example, return_tensors="pt").to(device)
|
64 |
+
tag_predictions = tagger(**inputs).logits[0].argmax(-1)
|
65 |
+
|
66 |
+
predictions = [tokenizer.decode(inputs.input_ids[0, seq[0]:seq[1]]).strip() for seq in find_sequences(tag_predictions)]
|
67 |
+
|
68 |
+
print(example)
|
69 |
+
print(predictions)
|
70 |
+
```
|
71 |
+
|
72 |
+
The output will be
|
73 |
+
|
74 |
+
```python
|
75 |
+
Fire volleys at the command happens: The soldiers were expected to fire volleys at the command of officers, but in practice this happened only in the first minutes of the battle .
|
76 |
+
['first minutes of the battle']
|
77 |
+
Historische Ereignisse: Siebenjährigen Krieg von 1756 bis 1763, war Preußen als fünfte Großmacht neben Frankreich, Großbritannien, Österreich und Russland in der europäischen Pentarchie anerkannt .
|
78 |
+
['Siebenjährigen Krieg']
|
79 |
+
高度: 东方明珠自落成后便为上海天际线的组成部分之一,总高468米。
|
80 |
+
['468米']
|
81 |
+
倒れた場所: カフカは高松の私立図書館に通うようになるが、ある日目覚めると、自分が森の中で血だらけで倒れていた。
|
82 |
+
['森']
|
83 |
+
```
|