--- license: mit base_model: xlm-roberta-large tags: - generated_from_trainer model-index: - name: xlm-roberta-large-metaie results: [] --- # MetaIE This is a multilingual meta-model distilled from ChatGPT-3.5-turbo for information extraction. This is an intermediate checkpoint that can be well-transferred to all kinds of downstream information extraction tasks. This model can also be tested by different label-to-span matching as shown in the following example: Ten languages are supported: - English - Français - Español - Italiano - Deutsch - Polski - Pусский - 中文 - 日本語 - 한국어 ```python from transformers import AutoModelForTokenClassification, AutoTokenizer import torch device = torch.device("cuda:0") path = f"KomeijiForce/xlm-roberta-large-metaie" tokenizer = AutoTokenizer.from_pretrained(path) tagger = AutoModelForTokenClassification.from_pretrained(path).to(device) def find_sequences(lst): sequences = [] i = 0 while i < len(lst): if lst[i] == 0: start = i end = i i += 1 while i < len(lst) and lst[i] == 1: end = i i += 1 sequences.append((start, end+1)) else: i += 1 return sequences examples = [ "Fire volleys at the command happens: The soldiers were expected to fire volleys at the command of officers, but in practice this happened only in the first minutes of the battle .", "Historische Ereignisse: Siebenjährigen Krieg von 1756 bis 1763, war Preußen als fünfte Großmacht neben Frankreich, Großbritannien, Österreich und Russland in der europäischen Pentarchie anerkannt .", "高度: 东方明珠自落成后便为上海天际线的组成部分之一,总高468米。", "倒れた場所: カフカは高松の私立図書館に通うようになるが、ある日目覚めると、自分が森の中で血だらけで倒れていた。", ] for example in examples: inputs = tokenizer(example, return_tensors="pt").to(device) tag_predictions = tagger(**inputs).logits[0].argmax(-1) predictions = [tokenizer.decode(inputs.input_ids[0, seq[0]:seq[1]]).strip() for seq in find_sequences(tag_predictions)] print(example) print(predictions) ``` The output will be ```python Fire volleys at the command happens: The soldiers were expected to fire volleys at the command of officers, but in practice this happened only in the first minutes of the battle . ['first minutes of the battle'] Historische Ereignisse: Siebenjährigen Krieg von 1756 bis 1763, war Preußen als fünfte Großmacht neben Frankreich, Großbritannien, Österreich und Russland in der europäischen Pentarchie anerkannt . ['Siebenjährigen Krieg'] 高度: 东方明珠自落成后便为上海天际线的组成部分之一,总高468米。 ['468米'] 倒れた場所: カフカは高松の私立図書館に通うようになるが、ある日目覚めると、自分が森の中で血だらけで倒れていた。 ['森'] ```