brian-lim
/

smile-style-transfer

@@ -8,24 +8,60 @@ language:
 # Korean Style Transfer
 This model is a fine-tuned version of [Synatra-7B-v0.3-dpo](https://huggingface.co/maywell/Synatra-7B-v0.3-dpo) using a Korean style dataset provided by Smilegate AI (https://github.com/smilegate-ai/korean_smile_style_dataset/tree/main).
-Since the original dataset is tabular and not fit for training the LLM, I have preprocessed it into instruction-input-output format, which can be found (here)[https://huggingface.co/datasets/brian-lim/smile_style_orca].
 The dataset is then fed into the ChatML template. Feel free to use my version of the dataset as needed.
 해당 모델은 [Synatra-7B-v0.3-dpo](https://huggingface.co/maywell/Synatra-7B-v0.3-dpo) 모델을 스마일게이트 AI에서 제공하는 Smile style 데이터셋으로 파인튜닝 했습니다.
-기존 데이터셋은 테이블 형태로 되어있어 해당 데이터를 instruction-input-output 형태로 만들었고, (여기)[https://huggingface.co/datasets/brian-lim/smile_style_orca]에서 확인 가능합니다.
 데이터셋을 불러온 뒤 ChatML 형식에 맞춰 훈련 데이터 구축을 한 뒤 진행했습니다. 필요하시다면 자유롭게 사용하시기 바랍니다.
-# Intended use & limitations
-To be added
-추가 예정
-# How to use
-To be added
-추가예정
 ---
 license: apache-2.0

 # Korean Style Transfer
 This model is a fine-tuned version of [Synatra-7B-v0.3-dpo](https://huggingface.co/maywell/Synatra-7B-v0.3-dpo) using a Korean style dataset provided by Smilegate AI (https://github.com/smilegate-ai/korean_smile_style_dataset/tree/main).
+Since the original dataset is tabular and not fit for training the LLM, I have preprocessed it into an instruction-input-output format, which can be found [here](https://huggingface.co/datasets/brian-lim/smile_style_orca).
 The dataset is then fed into the ChatML template. Feel free to use my version of the dataset as needed.
 해당 모델은 [Synatra-7B-v0.3-dpo](https://huggingface.co/maywell/Synatra-7B-v0.3-dpo) 모델을 스마일게이트 AI에서 제공하는 Smile style 데이터셋으로 파인튜닝 했습니다.
+기존 데이터셋은 테이블 형태로 되어있어 해당 데이터를 instruction-input-output 형태로 만들었고, [여기](https://huggingface.co/datasets/brian-lim/smile_style_orca)에서 확인 가능합니다.
 데이터셋을 불러온 뒤 ChatML 형식에 맞춰 훈련 데이터 구축을 한 뒤 진행했습니다. 필요하시다면 자유롭게 사용하시기 바랍니다.
+# How to use
+```python
+>>> import torch
+>>> from transformers import AutoModelForCausalLM, AutoTokenizer
+device = 'cuda' if torch.cuda.is_available() else 'cpu'
+tokenizer = AutoTokenizer.from_pretrained('brian-lim/smile-style-transfer')
+model = AutoModelForCausalLM.from_pretrained('brian-lim/smile-style-transfer', device_map = device)
+prompts = {'informal': '주어진 글을 가능한 형식적이지 않고 딱딱하지 않은 대화체로 바꿔줘.',
+          'android': '주어진 글을 가능한 안드로이드 로봇과 같은 대화체로 바꿔줘.',
+          'azae': '주어진 글을 가능한 아저씨같은 말투로 바꿔줘.',
+          'chat': '주어진 글을 가능한 인터넷상에 사용하는 말투로 바꿔줘.',
+          'choding': '주어진 글을 가능한 초등학생처럼 짧게 줄인 대화체로 바꿔줘.',
+          'emoticon': '주어진 글을 가능한 이모티콘이 들어간 대화체로 바꿔줘.',
+          'enfp': '주어진 글을 가능한 활기차면서 공감을 많이 하는 친절한 대화체로 바꿔줘.',
+          'gentle' : '주어진 글을 가능한 “요”로 끝나지 않으면서 깔끔한 대화체로 바꿔줘.',
+          'halbae' : '주어진 글을 가능한 연륜이 있는 할아버지 같은 맡투로 바꿔줘.',
+          'halmae' : '주어진 글을 가능한 비속어가 들어가는 할머니 같은 맡투로 바꿔줘.',
+          'joongding': '주어진 글을 가능한 중학교 2학년의 말투로 바꿔줘.',
+          'king': '주어진 글을 가능한 조선시대 왕의 말투로 바꿔줘.',
+          'seonbi': '주어진 글을 가능한 조선시대 선비의 말투로 바꿔줘.',
+          'sosim': '주어진 글을 가능한 아주 소심하고 조심스러운 말투로 바꿔줘.',
+          'translator': '주어진 글을 가능한 어색한 한국어 번역 말투로 바꿔줘.',
+          }
+query = '[INPUT]: 안녕하세요. 요즘 날씨가 많이 쌀쌀하네요 \n[OUTPUT]: '
+input_query = prompts['king'] + query
+input_tokenized = tokenizer(input_query,return_tensors="pt").to(device)
+g_config = GenerationConfig(temperature=0.3,
+                        repetition_penalty=1.2,
+                        max_new_tokens=768,
+                        do_sample=True,
+                        )
+output = model.generate(**input_tokenized,
+                      generation_config=g_config,
+                      pad_token_id=tokenizer.eos_token_id,
+                      eos_token_id=tokenizer.eos_token_id,)
+output_text = tokenizer.decode(output.detach().cpu().numpy()[0])
+output_text = output_text[output_text.find('[OUTPUT]'):]
+print(output_text)
+```
 ---
 license: apache-2.0