Edit model card

MetaIE

This is a multilingual meta-model distilled from ChatGPT-3.5-turbo for information extraction. This is an intermediate checkpoint that can be well-transferred to all kinds of downstream information extraction tasks. This model can also be tested by different label-to-span matching as shown in the following example:

Ten languages are supported:

  • English
  • Français
  • Español
  • Italiano
  • Deutsch
  • Polski
  • Pусский
  • 中文
  • 日本語
  • 한국어
from transformers import AutoModelForTokenClassification, AutoTokenizer
import torch

device = torch.device("cuda:0")
path = f"KomeijiForce/xlm-roberta-large-metaie"
tokenizer = AutoTokenizer.from_pretrained(path)
tagger = AutoModelForTokenClassification.from_pretrained(path).to(device)

def find_sequences(lst):
    sequences = []
    i = 0
    while i < len(lst):
        if lst[i] == 0:
            start = i
            end = i
            i += 1
            while i < len(lst) and lst[i] == 1:
                end = i
                i += 1
            sequences.append((start, end+1))
        else:
            i += 1
    return sequences

examples = [
    "Fire volleys at the command happens: The soldiers were expected to fire volleys at the command of officers, but in practice this happened only in the first minutes of the battle .",
    "Historische Ereignisse: Siebenjährigen Krieg von 1756 bis 1763, war Preußen als fünfte Großmacht neben Frankreich, Großbritannien, Österreich und Russland in der europäischen Pentarchie anerkannt .",
    "高度: 东方明珠自落成后便为上海天际线的组成部分之一,总高468米。",
    "倒れた場所: カフカは高松の私立図書館に通うようになるが、ある日目覚めると、自分が森の中で血だらけで倒れていた。",
]

for example in examples:
    inputs = tokenizer(example, return_tensors="pt").to(device)
    tag_predictions = tagger(**inputs).logits[0].argmax(-1)

    predictions = [tokenizer.decode(inputs.input_ids[0, seq[0]:seq[1]]).strip() for seq in find_sequences(tag_predictions)]

    print(example)
    print(predictions)

The output will be

Fire volleys at the command happens: The soldiers were expected to fire volleys at the command of officers, but in practice this happened only in the first minutes of the battle .
['first minutes of the battle']
Historische Ereignisse: Siebenjährigen Krieg von 1756 bis 1763, war Preußen als fünfte Großmacht neben Frankreich, Großbritannien, Österreich und Russland in der europäischen Pentarchie anerkannt .
['Siebenjährigen Krieg']
高度: 东方明珠自落成后便为上海天际线的组成部分之一,总高468米。
['468米']
倒れた場所: カフカは高松の私立図書館に通うようになるが、ある日目覚めると、自分が森の中で血だらけで倒れていた。
['森']
Downloads last month
1
Safetensors
Model size
559M params
Tensor type
F32
·
This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social visibility and check back later, or deploy to Inference Endpoints (dedicated) instead.

Finetuned from