KomeijiForce
/

roberta-large-metaie-gpt4

Token Classification

Inference Endpoints

Model card Files Files and versions Community

roberta-large-metaie-gpt4 / README.md

KomeijiForce's picture

Update README.md

5b101ea verified 9 months ago

|

history blame contribute delete

2.23 kB

	---
	license: mit
	base_model: roberta-large
	datasets:
	- KomeijiForce/MetaIE-Pretrain
	language:
	- en
	metrics:
	- f1
	pipeline_tag: token-classification
	---

	<!-- This model card has been generated automatically according to the information the Trainer had access to. You
	should probably proofread and complete it, then remove this comment. -->

	# MetaIE

	This is a meta-model distilled from ChatGPT-4 for information extraction. This is an intermediate checkpoint that can be well-transferred to all kinds of downstream information extraction tasks. This model can also be tested by different label-to-span matching as shown in the following example:

	```python
	from transformers import AutoModelForTokenClassification, AutoTokenizer
	import torch

	device = torch.device("cuda:0")
	path = f"KomeijiForce/roberta-large-metaie-gpt4"
	tokenizer = AutoTokenizer.from_pretrained(path)
	tagger = AutoModelForTokenClassification.from_pretrained(path).to(device)

	def find_sequences(lst):
	sequences = []
	i = 0
	while i < len(lst):
	if lst[i] == 0:
	start = i
	end = i
	i += 1
	while i < len(lst) and lst[i] == 1:
	end = i
	i += 1
	sequences.append((start, end+1))
	else:
	i += 1
	return sequences

	def is_sublst(lst1, lst2):
	for idx in range(len(lst1)-len(lst2)+1):
	if lst1[idx:idx+len(lst2)] == lst2:
	return True
	return False

	words = ["John", "Smith", "loves", "his", "hometown", ",", "Los", "Angeles", "."]

	for prefix in ["Person", "Location", "John Smith births in", "Positive opinion"]:

	sentence = " ".join([prefix, ":"]+words)

	inputs = tokenizer(sentence, return_tensors="pt").to(device)
	tag_predictions = tagger(**inputs).logits[0].argmax(-1)

	predictions = [tokenizer.decode(inputs.input_ids[0, seq[0]:seq[1]]).strip() for seq in find_sequences(tag_predictions)]
	predictions = [prediction for prediction in predictions if is_sublst(words, prediction.split())]

	print(prefix, predictions)
	```

	The output will be

	```python
	"Person" ['John Smith']
	"Location" ['Los Angeles']
	"John Smith births in" ['Los Angeles']
	"Positive opinion" ['loves his hometown']
	```