muntasir2179/TinyLlama-1.1B-rag-finetuned-v1.0

Introduction

This TinyLlama1.1B model is finetuned from the base model. The goal is to finetune the model to make it suitable for RAG applications. The base models often perform poor on user query generating out of context response. There are also the problem of hallucination in LLMs. It refers to the case when LLMs generates incorrect answers rather than skipping the user question by appropriate response. To address this issue I have finetuned the base model by a hybride dataset which contains both meaningful question, context and answers and it also contains the question to which LLMs hallucinate and its proper response. This way of finetuning may reduce the hallucination rate of the model.

How to use

Install dependencies

pip install -q accelerate==0.21.0 peft==0.4.0 bitsandbytes==0.40.2 transformers==4.31.0 trl==0.4.7

You can use the following code for model inference.

import torch
from transformers import AutoModelForCausalLM, AutoTokenizer, pipeline
import pprint

torch.set_default_device("cuda")

#Create model
model = AutoModelForCausalLM.from_pretrained("MuntasirAhmed/TinyLlama-1.1B-rag-finetuned-v1.0", 
                                             torch_dtype=torch.float16, 
                                             device_map="auto",
                                             trust_remote_code=True)

tokenizer = AutoTokenizer.from_pretrained("MuntasirAhmed/TinyLlama-1.1B-rag-finetuned-v1.0",
                                          trust_remote_code=True)
pipe = pipeline(task="text-generation",
                model=model,
                tokenizer=tokenizer,
                max_length=200)

#Set inputs
prompt = "What is a large language model?"
formated_prompt = f'''<|system|>
You are a friendly chatbot who responses to the user's question by looking into context.</s>
<|user|>
{prompt}</s>
'''

#Generate the answer
result = pipe(formated_prompt)
pprint.pp(result[0]['generated_text'])