zaemyung
/

DElIteraTeR-RoBERTa-Intent-Span-Detector

Token Classification

Inference Endpoints

Model card Files Files and versions Community

DElIteraTeR-RoBERTa-Intent-Span-Detector / README.md

zaemyung's picture

Update license

a1e43d5 10 months ago

|

history blame contribute delete

No virus

2.08 kB

	---
	license: cc-by-nc-4.0
	datasets:
	- zaemyung/IteraTeR_plus
	language:
	- en
	pipeline_tag: token-classification
	---
	# DElIteraTeR-RoBERTa-Intent-Span-Detector
	This model was obtained by fine-tuning [roberta-large](https://huggingface.co/roberta-large) on [IteraTeR+](https://huggingface.co/datasets/zaemyung/IteraTeR_plus) `multi_sent` dataset.

	Paper: [Improving Iterative Text Revision by Learning Where to Edit from Other Revision Tasks](https://aclanthology.org/2022.emnlp-main.678/) <br>
	Authors: Zae Myung Kim, Wanyu Du, Vipul Raheja, Dhruv Kumar, and Dongyeop Kang

	## Usage
	```python
	import torch
	from transformers import AutoTokenizer, AutoModelForTokenClassification

	tokenizer = AutoTokenizer.from_pretrained("zaemyung/DElIteraTeR-RoBERTa-Intent-Span-Detector")

	# update tokenizer with special tokens
	INTENT_CLASSES = ['none', 'clarity', 'fluency', 'coherence', 'style', 'meaning-changed'] # `meaning-changed` is not used
	INTENT_OPENED_TAGS = [f'<{intent_class}>' for intent_class in INTENT_CLASSES]
	INTENT_CLOSED_TAGS = [f'</{intent_class}>' for intent_class in INTENT_CLASSES]
	INTENT_TAGS = set(INTENT_OPENED_TAGS + INTENT_CLOSED_TAGS)
	special_tokens_dict = {'additional_special_tokens': ['<bos>', '<eos>'] + list(INTENT_TAGS)}
	tokenizer.add_special_tokens(special_tokens_dict)

	model = AutoModelForTokenClassification.from_pretrained("zaemyung/DElIteraTeR-RoBERTa-Intent-Span-Detector")

	id2label = {0: "none", 1: "clarity", 2: "fluency", 3: "coherence", 4: "style", 5: "meaning-changed"}

	before_text = '<bos>I likes coffee?<eos>'
	model_input = tokenizer(before_text, return_tensors='pt')
	model_output = model(**model_input)
	softmax_scores = torch.softmax(model_output.logits, dim=-1)
	pred_ids = torch.argmax(softmax_scores, axis=-1)[0].tolist()
	pred_intents = [id2label[_id] for _id in pred_ids]

	tokens = tokenizer.convert_ids_to_tokens(model_input['input_ids'][0])

	for token, pred_intent in zip(tokens, pred_intents):
	print(f"{token}: {pred_intent}")

	"""
	<s>: none
	<bos>: none
	I: fluency
	Ġlikes: fluency
	Ġcoffee: none
	?: none
	<eos>: none
	</s>: none
	"""
	```