Update README.md

268c9c7 7 months ago

No virus

6.02 kB

	---
	language:
	- ko
	pipeline_tag: text-generation
	---
	# Model Card for Model ID

	<!-- Provide a quick summary of what the model is/does. -->

	The model is a fined tuned version of a Korean Large Language model [KT-AI/midm-bitext-S-7B-inst-v1](https://huggingface.co/KT-AI/midm-bitext-S-7B-inst-v1).

	The purpose of the model is to analyze any "food order sentence" and extract information of product from the sentence.

	For example, let's assume an ordering sentence:
	```
	여기요 춘천닭갈비 4인분하고요. 라면사리 추가하겠습니다. 콜라 300ml 두캔주세요.
	```
	Then the model is expected to generate product informations like:
	```
	- 분석 결과 0: 음식명:춘천닭갈비, 수량:4인분
	- 분석 결과 1: 음식명:라면사리
	- 분석 결과 2: 음식명:콜라, 옵션:300ml, 수량:두캔
	```

	## Model Details

	### Model Description

	<!-- Provide a longer summary of what this model is. -->

	- Developed by: [Jangmin Oh](https://huggingface.co/jangmin)
	- Model type: a Decoder-only Transformer
	- Language(s) (NLP): ko
	- License: You should keep the CC-BY-NC 4.0 form KT-AI.
	- Finetuned from model: [KT-AI/midm-bitext-S-7B-inst-v1](https://huggingface.co/KT-AI/midm-bitext-S-7B-inst-v1)



	## Bias, Risks, and Limitations

	The current model was developed using the GPT-4 API to generate a dataset for order sentences, and it has been fine-tuned on this dataset. Please note that we do not assume any responsibility for risks or damages caused by this model.

	## How to Get Started with the Model

	This is a simple example of usage of the model.
	If you want to load the fined-tuned model in INT4, please specify @load_in_4bit=True@ instead of @load_in_8bit=True@.

	``` python
	import torch
	from transformers import AutoTokenizer, AutoModelForCausalLM, TextStreamer

	model_id = 'jangmin/merged-midm-7B-food-order-understanding-30K'

	prompt_template = """###System;{System}
	###User;{User}
	###Midm;"""

	default_system_msg = (
	"너는 먼저 사용자가 입력한 주문 문장을 분석하는 에이전트이다. 이로부터 주문을 구성하는 음식명, 옵션명, 수량을 차례대로 추출해야 한다."
	)

	def wrapper_generate(model, tokenizer, input_prompt, do_stream=False):
	data = tokenizer(input_prompt, return_tensors="pt")
	streamer = TextStreamer(tokenizer, skip_prompt=True, skip_special_tokens=True)
	input_ids = data.input_ids[..., :-1]
	with torch.no_grad():
	pred = model.generate(
	input_ids=input_ids.cuda(),
	streamer=streamer if do_stream else None,
	use_cache=True,
	max_new_tokens=float('inf'),
	do_sample=False
	)
	decoded_text = tokenizer.batch_decode(pred, skip_special_tokens=True)
	decoded_text = decoded_text[0].replace("<[!newline]>", "\n")
	return (decoded_text[len(input_prompt):])

	trained_model = AutoModelForCausalLM.from_pretrained(
	model_id,
	load_in_8bit=True,
	device_map="auto",,
	trust_remote_code=True,
	)

	tokenizer = AutoTokenizer.from_pretrained(
	model_id,
	trust_remote_code=True,
	)

	sentence = "아이스아메리카노 톨사이즈 한잔 하고요. 딸기스무디 한잔 주세요. 또, 콜드브루라떼 하나요."
	analysis = wrapper_generate(
	model=trained_model,
	tokenizer=tokenizer,
	input_prompt=prompt_template.format(System=default_system_msg, User=sentence),
	do_stream=False
	)
	print(analysis)
	```

	## Training Details

	### Training Data

	The dataset was generated by GPT-4 API with a carefully designed prompt. A prompt template is desginged to generate examples of sentence pairs of a food order and its understanding. Total 30k examples were generated. Note that it cost about $400 to generate 30K examples through 3,000 API calls.

	Some generated examples are as follows:

	``` json
	{
	'input': '다음은 매장에서 고객이 음식을 주문하는 주문 문장이다. 이를 분석하여 음식명, 옵션명, 수량을 추출하여 고객의 의도를 이해하고자 한다.\n분석 결과를 완성해주기 바란다.\n\n### 명령: 제육볶음 한그릇하고요, 비빔밥 한그릇 추가해주세요. ### 응답:\n',
	'output': '- 분석 결과 0: 음식명:제육볶음,수량:한그릇\n- 분석 결과 1: 음식명:비빔밥,수량:한그릇'
	},
	{
	'input': '다음은 매장에서 고객이 음식을 주문하는 주문 문장이다. 이를 분석하여 음식명, 옵션명, 수량을 추출하여 고객의 의도를 이해하고자 한다.\n분석 결과를 완성해주기 바란다.\n\n### 명령: 사천탕수육 곱배기 주문하고요, 샤워크림치킨도 하나 추가해주세요. ### 응답:\n',
	'output': '- 분석 결과 0: 음식명:사천탕수육,옵션:곱배기\n- 분석 결과 1: 음식명:샤워크림치킨,수량:하나'
	}

	```

	## Evaluation

	"The evaluation dataset comprises 3,004 examples, each consisting of a pair: a 'food-order sentence' and its corresponding 'analysis result' as a reference."

	The bleu scores on the dataset are as follows.

	\| \| llama-2 model \| midm model \|
	\|---\|---\|---\|
	\| score \| 93.323054 \| 93.878258 \|
	\| counts \| [81382, 76854, 72280, 67869] \| [81616, 77246, 72840, 68586] \|
	\| totals \| [84327, 81323, 78319, 75315] \| [84376, 81372, 78368, 75364] \|
	\| precisions \| [96.51, 94.5, 92.29, 90.11] \| [96.73, 94.93, 92.95, 91.01] \|
	\| bp \| 1.0 \| 1.0 \|
	\| sys_len \| 84327 \| 84376 \|
	\| ref_len \| 84124 \| 84124 \|

	llama-2 model referes the result of the [jangmin/merged-llama2-7b-chat-hf-food-order-understanding-30K], which was fine-tuned above llama-2-7b-chat-hf.

	## Note for Pretrained Model

	The citation of the pretrained model:

	```
	@misc{kt-mi:dm,
	title = {Mi:dm: KT Bilingual (Korean,English) Generative Pre-trained Transformer},
	author = {KT},
	year = {2023},
	url = {https://huggingface.co/KT-AT/midm-bitext-S-7B-inst-v1}
	howpublished = {\url{https://genielabs.ai}},
	}
	```

	## Model Card Authors

	Jangmin Oh

	## Model Card Contact

	Jangmin Oh