--- datasets: - Local license: bigscience-bloom-rail-1.0 language: - id pipeline_tag: text-generation --- # Table of Contents 1. [Model Summary](#model-summary) 2. [Use](#use) 4. [Training](#training) # Model Summary > We present KARINA, finetuned from BLOOMZ bigscience/bloomz-3b, a family of models capable of following human instructions in dozens of languages zero-shot. We finetune BLOOMZ pretrained multilingual language models on our crosslingual task mixture (xP3) and find the resulting models capable of crosslingual generalization to unseen tasks & languages. # Use ## Intended use We recommend using the model to perform tasks expressed in natural language. For example, given the prompt "*prompt = f"Given the question:\n{{ siapa kamu? }}\n---\nAnswer:\n"*", the model will most likely answer "*Saya Karina. Ada yang bisa saya bantu?*". ## How to use ### CPU
Click to expand ```python # pip install -q transformers from transformers import AutoModelForCausalLM, AutoTokenizer MODEL_NAME = "yodi/karina" tokenizer = AutoTokenizer.from_pretrained(MODEL_NAME) model = AutoModelForCausalLM.from_pretrained(MODEL_NAME) inputs = tokenizer.encode("Given the question:\n{{ siapa kamu? }}\n---\nAnswer:\n", return_tensors="pt") outputs = model.generate(inputs) print(tokenizer.decode(outputs[0])) ```
### GPU in 4 bit
Click to expand ```python # pip install -q transformers from transformers import AutoModelForCausalLM, AutoTokenizer from transformers import pipeline MODEL_NAME = "yodi/karina" model_4bit = AutoModelForCausalLM.from_pretrained(MODEL_NAME, device_map="cuda:1", load_in_4bit=True) tokenizer = AutoTokenizer.from_pretrained(MODEL_NAME) prompt = f"Given the question:\n{{ siapa kamu? }}\n---\nAnswer:\n" generator = pipeline('text-generation', model=model_4bit, tokenizer=tokenizer, do_sample=False) result = generator(prompt, max_length=256) print(result) ```
### GPU in 8bit
Click to expand ```python # pip install -q transformers from transformers import AutoModelForCausalLM, AutoTokenizer from transformers import pipeline MODEL_NAME = "yodi/karina" model_4bit = AutoModelForCausalLM.from_pretrained(MODEL_NAME, device_map="cuda:1", load_in_8bit=True) tokenizer = AutoTokenizer.from_pretrained(MODEL_NAME) prompt = f"Given the question:\n{{ siapa kamu? }}\n---\nAnswer:\n" generator = pipeline('text-generation', model=model_4bit, tokenizer=tokenizer, do_sample=False) result = generator(prompt, max_length=256) print(result) ```
``` [{'generated_text': 'Given the question:\n{ siapa kamu? }\n---\nAnswer:\nSaya Karina, asisten virtual siap membantu seputar estimasi harga atau pertanyaan lain'}] ``` ### Infer in Local with Gradio ```python from transformers import AutoModelForCausalLM, AutoTokenizer from transformers import pipeline import re import gradio as gr MODEL_NAME = "yodi/karina" model_4bit = AutoModelForCausalLM.from_pretrained(MODEL_NAME, device_map="cuda:1", load_in_4bit=True) tokenizer = AutoTokenizer.from_pretrained(MODEL_NAME) generator = pipeline('text-generation', model=model_4bit, tokenizer=tokenizer, do_sample=False) def preprocess(text): return f"Given the question:\n{{ {text} }}\n---\nAnswer:\n" def generate(text): preprocess_result = preprocess(text) result = generator(preprocess_result, max_length=256) output = re.split(r'\n---\nAnswer:\n',result[0]['generated_text'])[1] return output with gr.Blocks() as demo: input_text = gr.Textbox(label="Input", lines=1) button = gr.Button("Submit") output_text = gr.Textbox(lines=6, label="Output") button.click(generate, inputs=[input_text], outputs=output_text) demo.launch(enable_queue=True, debug=True) ``` And open the gradio url from browser. ## Training procedure The following `bitsandbytes` quantization config was used during training: - load_in_8bit: False - load_in_4bit: True - llm_int8_threshold: 6.0 - llm_int8_skip_modules: None - llm_int8_enable_fp32_cpu_offload: False - llm_int8_has_fp16_weight: False - bnb_4bit_quant_type: nf4 - bnb_4bit_use_double_quant: True - bnb_4bit_compute_dtype: float16 ### Framework versions - PEFT 0.5.0.dev0 ### # Limitations **Prompt Engineering:** The performance may vary depending on the prompt and its following BLOOMZ models. # Training ## Model - **Architecture:** Same as [bloom](https://huggingface.co/bigscience/bloom), also refer to the `config.json` file