KomeijiForce commited on
Commit
ed64c0f
1 Parent(s): f42ac62

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +70 -41
README.md CHANGED
@@ -11,44 +11,73 @@ model-index:
11
  <!-- This model card has been generated automatically according to the information the Trainer had access to. You
12
  should probably proofread and complete it, then remove this comment. -->
13
 
14
- # xlm-roberta-large-metaie
15
-
16
- This model is a fine-tuned version of [xlm-roberta-large](https://huggingface.co/xlm-roberta-large) on an unknown dataset.
17
-
18
- ## Model description
19
-
20
- More information needed
21
-
22
- ## Intended uses & limitations
23
-
24
- More information needed
25
-
26
- ## Training and evaluation data
27
-
28
- More information needed
29
-
30
- ## Training procedure
31
-
32
- ### Training hyperparameters
33
-
34
- The following hyperparameters were used during training:
35
- - learning_rate: 1e-05
36
- - train_batch_size: 2
37
- - eval_batch_size: 8
38
- - seed: 42
39
- - gradient_accumulation_steps: 16
40
- - total_train_batch_size: 32
41
- - optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
42
- - lr_scheduler_type: linear
43
- - num_epochs: 1.0
44
-
45
- ### Training results
46
-
47
-
48
-
49
- ### Framework versions
50
-
51
- - Transformers 4.35.2
52
- - Pytorch 2.0.0+cu117
53
- - Datasets 2.15.0
54
- - Tokenizers 0.15.0
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
11
  <!-- This model card has been generated automatically according to the information the Trainer had access to. You
12
  should probably proofread and complete it, then remove this comment. -->
13
 
14
+ # MetaIE
15
+
16
+ This is a multilingual meta-model distilled from ChatGPT-3.5-turbo for information extraction. This is an intermediate checkpoint that can be well-transferred to all kinds of downstream information extraction tasks. This model can also be tested by different label-to-span matching as shown in the following example:
17
+
18
+ Ten languages are supported:
19
+ - English
20
+ - Français
21
+ - Español
22
+ - Italiano
23
+ - Deutsch
24
+ - Polski
25
+ - Pусский
26
+ - 中文
27
+ - 日本語
28
+ - 한국어
29
+
30
+ ```python
31
+ from transformers import AutoModelForTokenClassification, AutoTokenizer
32
+ import torch
33
+
34
+ device = torch.device("cuda:0")
35
+ path = f"KomeijiForce/xlm-roberta-large-metaie"
36
+ tokenizer = AutoTokenizer.from_pretrained(path)
37
+ tagger = AutoModelForTokenClassification.from_pretrained(path).to(device)
38
+
39
+ def find_sequences(lst):
40
+ sequences = []
41
+ i = 0
42
+ while i < len(lst):
43
+ if lst[i] == 0:
44
+ start = i
45
+ end = i
46
+ i += 1
47
+ while i < len(lst) and lst[i] == 1:
48
+ end = i
49
+ i += 1
50
+ sequences.append((start, end+1))
51
+ else:
52
+ i += 1
53
+ return sequences
54
+
55
+ examples = [
56
+ "Fire volleys at the command happens: The soldiers were expected to fire volleys at the command of officers, but in practice this happened only in the first minutes of the battle .",
57
+ "Historische Ereignisse: Siebenjährigen Krieg von 1756 bis 1763, war Preußen als fünfte Großmacht neben Frankreich, Großbritannien, Österreich und Russland in der europäischen Pentarchie anerkannt .",
58
+ "高度: 东方明珠自落成后便为上海天际线的组成部分之一,总高468米。",
59
+ "倒れた場所: カフカは高松の私立図書館に通うようになるが、ある日目覚めると、自分が森の中で血だらけで倒れていた。",
60
+ ]
61
+
62
+ for example in examples:
63
+ inputs = tokenizer(example, return_tensors="pt").to(device)
64
+ tag_predictions = tagger(**inputs).logits[0].argmax(-1)
65
+
66
+ predictions = [tokenizer.decode(inputs.input_ids[0, seq[0]:seq[1]]).strip() for seq in find_sequences(tag_predictions)]
67
+
68
+ print(example)
69
+ print(predictions)
70
+ ```
71
+
72
+ The output will be
73
+
74
+ ```python
75
+ Fire volleys at the command happens: The soldiers were expected to fire volleys at the command of officers, but in practice this happened only in the first minutes of the battle .
76
+ ['first minutes of the battle']
77
+ Historische Ereignisse: Siebenjährigen Krieg von 1756 bis 1763, war Preußen als fünfte Großmacht neben Frankreich, Großbritannien, Österreich und Russland in der europäischen Pentarchie anerkannt .
78
+ ['Siebenjährigen Krieg']
79
+ 高度: 东方明珠自落成后便为上海天际线的组成部分之一,总高468米。
80
+ ['468米']
81
+ 倒れた場所: カフカは高松の私立図書館に通うようになるが、ある日目覚めると、自分が森の中で血だらけで倒れていた。
82
+ ['森']
83
+ ```