karina / README.md
yodiaditya
update generating
d40f8e8
|
raw
history blame
4.76 kB
metadata
datasets:
  - Local
license: bigscience-bloom-rail-1.0
language:
  - id
pipeline_tag: text-generation

Table of Contents

  1. Model Summary
  2. Use
  3. Training

Model Summary

We present KARINA, finetuned from BLOOMZ bigscience/bloomz-3b, a family of models capable of following human instructions in dozens of languages zero-shot. We finetune BLOOMZ pretrained multilingual language models on our crosslingual task mixture (xP3) and find the resulting models capable of crosslingual generalization to unseen tasks & languages.

Use

Intended use

We recommend using the model to perform tasks expressed in natural language. For example, given the prompt "prompt = f"Given the question:\n{{ siapa kamu? }}\n---\nAnswer:\n"", the model will most likely answer "Saya Karina. Ada yang bisa saya bantu?".

How to use

CPU

Click to expand
# pip install -q transformers
from transformers import AutoModelForCausalLM, AutoTokenizer

MODEL_NAME = "yodi/karina"

tokenizer = AutoTokenizer.from_pretrained(MODEL_NAME)
model = AutoModelForCausalLM.from_pretrained(MODEL_NAME)

inputs = tokenizer.encode("Given the question:\n{{ siapa kamu? }}\n---\nAnswer:\n", return_tensors="pt")
outputs = model.generate(inputs)
print(tokenizer.decode(outputs[0]))

GPU in 4 bit

Click to expand
# pip install -q transformers 
from transformers import AutoModelForCausalLM, AutoTokenizer
from transformers import pipeline

MODEL_NAME = "yodi/karina"

model_4bit = AutoModelForCausalLM.from_pretrained(MODEL_NAME, device_map="cuda:1", load_in_4bit=True)
tokenizer = AutoTokenizer.from_pretrained(MODEL_NAME)

prompt = f"Given the question:\n{{ siapa kamu? }}\n---\nAnswer:\n"

generator = pipeline('text-generation',
                     model=model_4bit,
                     tokenizer=tokenizer,
                     do_sample=False)

result = generator(prompt, max_length=256)
print(result)

GPU in 8bit

Click to expand
# pip install -q transformers 
from transformers import AutoModelForCausalLM, AutoTokenizer
from transformers import pipeline

MODEL_NAME = "yodi/karina"

model_4bit = AutoModelForCausalLM.from_pretrained(MODEL_NAME, device_map="cuda:1", load_in_8bit=True)
tokenizer = AutoTokenizer.from_pretrained(MODEL_NAME)

prompt = f"Given the question:\n{{ siapa kamu? }}\n---\nAnswer:\n"

generator = pipeline('text-generation',
                     model=model_4bit,
                     tokenizer=tokenizer,
                     do_sample=False)

result = generator(prompt, max_length=256)
print(result)
[{'generated_text': 'Given the question:\n{ siapa kamu? }\n---\nAnswer:\nSaya Karina, asisten virtual siap membantu seputar estimasi harga atau pertanyaan lain'}]

Infer in Local with Gradio

from transformers import AutoModelForCausalLM, AutoTokenizer
from transformers import pipeline
import re

import gradio as gr

MODEL_NAME = "yodi/karina"

model_4bit = AutoModelForCausalLM.from_pretrained(MODEL_NAME, device_map="cuda:1", load_in_4bit=True)
tokenizer = AutoTokenizer.from_pretrained(MODEL_NAME)

generator = pipeline('text-generation',
                     model=model_4bit,
                     tokenizer=tokenizer,
                     do_sample=False)

def preprocess(text):
    return f"Given the question:\n{{ {text} }}\n---\nAnswer:\n"

def generate(text):
    preprocess_result = preprocess(text)
    result = generator(preprocess_result, max_length=256)
    output = re.split(r'\n---\nAnswer:\n',result[0]['generated_text'])[1]

    return output

with gr.Blocks() as demo:
    input_text = gr.Textbox(label="Input", lines=1)
    button = gr.Button("Submit")
    output_text = gr.Textbox(lines=6, label="Output")
    button.click(generate, inputs=[input_text], outputs=output_text)

demo.launch(enable_queue=True, debug=True)

And open the gradio url from browser.

Training procedure

The following bitsandbytes quantization config was used during training:

  • load_in_8bit: False
  • load_in_4bit: True
  • llm_int8_threshold: 6.0
  • llm_int8_skip_modules: None
  • llm_int8_enable_fp32_cpu_offload: False
  • llm_int8_has_fp16_weight: False
  • bnb_4bit_quant_type: nf4
  • bnb_4bit_use_double_quant: True
  • bnb_4bit_compute_dtype: float16

Framework versions

  • PEFT 0.5.0.dev0

Limitations

Prompt Engineering: The performance may vary depending on the prompt and its following BLOOMZ models.

Training

Model

  • Architecture: Same as bloom, also refer to the config.json file