About :
This π¦ Llama model was trained on a translated Alpaca dataset in Bahasa Indonesia. It uses Parameter Efficient Fine Tuning and LoRA to enable training on consumer-grade GPU hardware.
How to Use :
Load the π¦ Alpaca-LoRA model
import torch
import bitsandbytes as bnb
from transformers import LlamaTokenizer, LlamaForCausalLM, GenerationConfig
from peft import PeftModel, PeftConfig, prepare_model_for_int8_training, LoraConfig, get_peft_model
peft_model_id = "firqaaa/indo-Alpaca-LoRA-7b"
tokenizer = LlamaTokenizer.from_pretrained("decapoda-research/llama-7b-hf")
model = LlamaForCausalLM.from_pretrained("decapoda-research/llama-7b-hf",
load_in_8bit=True,
device_map="auto")
# Load the LoRA model
model = PeftModel.from_pretrained(model, peft_model_id)
Prompt Template
Prepare the prompt template
instruction = "Tuliskan deret bilangan fibbonaci. Tulis jawaban/respons dalam Bahasa Indonesia."
PROMPT = f"""Below is an instruction that describes a task. Write a response that appropriately completes the request.
### Instruction:
{instruction}
### Response:"""
Evaluation
feel free to change the parameters inside GenerationConfig
to get better result.
inputs = tokenizer(
PROMPT,
return_tensors="pt"
)
input_ids = inputs["input_ids"].cuda()
generation_config = GenerationConfig(
temperature=0.1,
top_p=0.95,
top_k=40,
num_beams=4,
repetition_penalty=1.15,
)
print("Generating...")
print("Instruction : {}".format(instruction))
generation_output = model.generate(
input_ids=input_ids,
generation_config=generation_config,
return_dict_in_generate=True,
output_scores=True,
max_new_tokens=512,
)
print("Response : ")
for s in generation_output.sequences:
print(tokenizer.decode(s).split("### Response:")[1])
Note :
Due to the high loss and lack of compute unit, we will update this model frequently to ensure the quality of generated text
This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social
visibility and check back later, or deploy to Inference Endpoints (dedicated)
instead.