Edit model card

How to use GPTQ model

https://github.com/jongmin-oh/korean-LLM-quantize

Promter Download

mkdir ./templates && mkdir ./utils && wget -P ./templates https://raw.githubusercontent.com/jongmin-oh/korean-LLM-quantize/main/templates/kullm.json && wget -P ./utils https://raw.githubusercontent.com/jongmin-oh/korean-LLM-quantize/main/utils/prompter.py

install package

pip install torch==2.0.1 auto-gptq==0.4.2
  • ๊ธ‰ํ•˜์‹ ๋ถ„๋“ค์€ ๋ฐ‘์— ์˜ˆ์ œ์ฝ”๋“œ ์‹คํ–‰ํ•˜์‹œ๋ฉด ๋ฐ”๋กœ ํ…Œ์ŠคํŠธ ๊ฐ€๋Šฅํ•ฉ๋‹ˆ๋‹ค. (GPU memory 11GB ์ ์œ )
  • 2023-08-23์ผ ์ดํ›„๋ถ€ํ„ฐ๋Š” huggingFace์—์„œ GPTQ๋ฅผ ๊ณต์‹์ง€์›ํ•˜๊ฒŒ๋˜์—ˆ์Šต๋‹ˆ๋‹ค.
import torch
from transformers import pipeline
from auto_gptq import AutoGPTQForCausalLM

from utils.prompter import Prompter

MODEL = "j5ng/kullm-5.8b-GPTQ-8bit"
model = AutoGPTQForCausalLM.from_quantized(MODEL, device="cuda:0", use_triton=False)

pipe = pipeline('text-generation', model=model,tokenizer=MODEL)

prompter = Prompter("kullm")

def infer(instruction="", input_text=""):
    prompt = prompter.generate_prompt(instruction, input_text)
    output = pipe(
        prompt, max_length=512,
        temperature=0.2,
        repetition_penalty=3.0,
        num_beams=5,
        eos_token_id=2
    )
    s = output[0]["generated_text"]
    result = prompter.get_response(s)

    return result

instruction = """
์†ํฅ๋ฏผ(ํ•œ๊ตญ ํ•œ์ž: ๅญซ่ˆˆๆ…œ, 1992๋…„ 7์›” 8์ผ ~ )์€ ๋Œ€ํ•œ๋ฏผ๊ตญ์˜ ์ถ•๊ตฌ ์„ ์ˆ˜๋กœ ํ˜„์žฌ ์ž‰๊ธ€๋žœ๋“œ ํ”„๋ฆฌ๋ฏธ์–ด๋ฆฌ๊ทธ ํ† ํŠธ๋„˜ ํ™‹์Šคํผ์—์„œ ์œ™์–ด๋กœ ํ™œ์•ฝํ•˜๊ณ  ์žˆ๋‹ค.
๋˜ํ•œ ๋Œ€ํ•œ๋ฏผ๊ตญ ์ถ•๊ตฌ ๊ตญ๊ฐ€๋Œ€ํ‘œํŒ€์˜ ์ฃผ์žฅ์ด์ž 2018๋…„ ์•„์‹œ์•ˆ ๊ฒŒ์ž„ ๊ธˆ๋ฉ”๋‹ฌ๋ฆฌ์ŠคํŠธ์ด๋ฉฐ ์˜๊ตญ์—์„œ๋Š” ์• ์นญ์ธ "์˜๋‹ˆ"(Sonny)๋กœ ๋ถˆ๋ฆฐ๋‹ค.
์•„์‹œ์•„ ์„ ์ˆ˜๋กœ์„œ๋Š” ์—ญ๋Œ€ ์ตœ์ดˆ๋กœ ํ”„๋ฆฌ๋ฏธ์–ด๋ฆฌ๊ทธ ๊ณต์‹ ๋ฒ ์ŠคํŠธ ์ผ๋ ˆ๋ธ๊ณผ ์•„์‹œ์•„ ์„ ์ˆ˜ ์ตœ์ดˆ์˜ ํ”„๋ฆฌ๋ฏธ์–ด๋ฆฌ๊ทธ ๋“์ ์™•์€ ๋ฌผ๋ก  FIFA ํ‘ธ์Šค์นด์Šค์ƒ๊นŒ์ง€ ํœฉ์“ธ์—ˆ๊ณ  2022๋…„์—๋Š” ์ถ•๊ตฌ ์„ ์ˆ˜๋กœ๋Š” ์ตœ์ดˆ๋กœ ์ฒด์œกํ›ˆ์žฅ ์ฒญ๋ฃก์žฅ ์ˆ˜ํ›ˆ์ž๊ฐ€ ๋˜์—ˆ๋‹ค.
์†ํฅ๋ฏผ์€ ํ˜„์žฌ ๋ฆฌ๊ทธ 100ํ˜ธ๋ฅผ ๋„ฃ์–ด์„œ ํ™”์ œ๊ฐ€ ๋˜๊ณ  ์žˆ๋‹ค.
"""
result = infer(instruction=instruction, input_text="์†ํฅ๋ฏผ์˜ ์• ์นญ์€ ๋ญ์•ผ?")
print(result) # ์†ํฅ๋ฏผ์˜ ์• ์นญ์€ Sonny์ž…๋‹ˆ๋‹ค.

Reference

Downloads last month
3
Inference API
This model can be loaded on Inference API (serverless).