Update README.md

e617904 over 1 year ago

No virus

5.1 kB

	---
	language:
	- ko
	tags:
	- classification
	license: mit
	datasets:
	- nsmc
	widget:
	- text: "불후의 명작입니다! 이렇게 감동적인 내용은 처음이에요"
	example_title: "Positive"
	- text: "시간이 정말 아깝습니다. 10점 만점에 1점도 아까워요.."
	example_title: "Negative"
	metrics:
	- accuracy
	- f1
	- precision
	- recall- accuracy
	---

	# Sentiment Binary Classification (fine-tuning with KoELECTRA-Small-v3 model and Naver Sentiment Movie Corpus dataset)

	## Usage (Amazon SageMaker inference applicable)
	It uses the interface of the SageMaker Inference Toolkit as is, so it can be easily deployed to SageMaker Endpoint.

	### inference_nsmc.py

	```python
	import json
	import sys
	import logging
	import torch
	from torch import nn
	from transformers import ElectraConfig
	from transformers import ElectraModel, AutoTokenizer, ElectraTokenizer, ElectraForSequenceClassification

	logging.basicConfig(
	level=logging.INFO,
	format='[{%(filename)s:%(lineno)d} %(levelname)s - %(message)s',
	handlers=[
	logging.FileHandler(filename='tmp.log'),
	logging.StreamHandler(sys.stdout)
	]
	)
	logger = logging.getLogger(__name__)

	max_seq_length = 128
	classes = ['Neg', 'Pos']

	tokenizer = AutoTokenizer.from_pretrained("daekeun-ml/koelectra-small-v3-nsmc")
	device = torch.device("cuda" if torch.cuda.is_available() else "cpu")


	def model_fn(model_path=None):
	####
	# If you have your own trained model
	# Huggingface pre-trained model: 'monologg/koelectra-small-v3-discriminator'
	####
	#config = ElectraConfig.from_json_file(f'{model_path}/config.json')
	#model = ElectraForSequenceClassification.from_pretrained(f'{model_path}/model.pth', config=config)

	# Download model from the Huggingface hub
	model = ElectraForSequenceClassification.from_pretrained('daekeun-ml/koelectra-small-v3-nsmc')
	model.to(device)
	return model


	def input_fn(input_data, content_type="application/jsonlines"):
	data_str = input_data.decode("utf-8")
	jsonlines = data_str.split("\n")
	transformed_inputs = []

	for jsonline in jsonlines:
	text = json.loads(jsonline)["text"][0]
	logger.info("input text: {}".format(text))
	encode_plus_token = tokenizer.encode_plus(
	text,
	max_length=max_seq_length,
	add_special_tokens=True,
	return_token_type_ids=False,
	padding="max_length",
	return_attention_mask=True,
	return_tensors="pt",
	truncation=True,
	)
	transformed_inputs.append(encode_plus_token)

	return transformed_inputs


	def predict_fn(transformed_inputs, model):
	predicted_classes = []

	for data in transformed_inputs:
	data = data.to(device)
	output = model(**data)

	softmax_fn = nn.Softmax(dim=1)
	softmax_output = softmax_fn(output[0])
	_, prediction = torch.max(softmax_output, dim=1)

	predicted_class_idx = prediction.item()
	predicted_class = classes[predicted_class_idx]
	score = softmax_output[0][predicted_class_idx]
	logger.info("predicted_class: {}".format(predicted_class))

	prediction_dict = {}
	prediction_dict["predicted_label"] = predicted_class
	prediction_dict['score'] = score.cpu().detach().numpy().tolist()

	jsonline = json.dumps(prediction_dict)
	logger.info("jsonline: {}".format(jsonline))
	predicted_classes.append(jsonline)

	predicted_classes_jsonlines = "\n".join(predicted_classes)
	return predicted_classes_jsonlines


	def output_fn(outputs, accept="application/jsonlines"):
	return outputs, accept
	```

	### test.py
	```python
	>>> from inference_nsmc import model_fn, input_fn, predict_fn, output_fn
	>>> with open('samples/nsmc.txt', mode='rb') as file:
	>>> model_input_data = file.read()
	>>> model = model_fn()
	>>> transformed_inputs = input_fn(model_input_data)
	>>> predicted_classes_jsonlines = predict_fn(transformed_inputs, model)
	>>> model_outputs = output_fn(predicted_classes_jsonlines)
	>>> print(model_outputs[0])

	[{inference_nsmc.py:47} INFO - input text: 이 영화는 최고의 영화입니다
	[{inference_nsmc.py:47} INFO - input text: 최악이에요. 배우의 연기력도 좋지 않고 내용도 너무 허접합니다
	[{inference_nsmc.py:77} INFO - predicted_class: Pos
	[{inference_nsmc.py:84} INFO - jsonline: {"predicted_label": "Pos", "score": 0.9619030952453613}
	[{inference_nsmc.py:77} INFO - predicted_class: Neg
	[{inference_nsmc.py:84} INFO - jsonline: {"predicted_label": "Neg", "score": 0.9994170665740967}
	{"predicted_label": "Pos", "score": 0.9619030952453613}
	{"predicted_label": "Neg", "score": 0.9994170665740967}
	```

	### Sample data (samples/nsmc.txt)
	```
	{"text": ["이 영화는 최고의 영화입니다"]}
	{"text": ["최악이에요. 배우의 연기력도 좋지 않고 내용도 너무 허접합니다"]}
	```

	## References
	- KoELECTRA: https://github.com/monologg/KoELECTRA
	- Naver Sentiment Movie Corpus Dataset: https://github.com/e9t/nsmc