LLmRa-2.7B

A conversational Open Pre-trained Transformer Language Model fine-tune.

LLmRa 2.7B, as a proof-of-concept fine-tune of facebook/opt-2.7b optimized for dialogue.

Disclaimer: NSFW data was included in the fine-tuning of this model. Although SFW inputs will usually result in SFW outputs, you are advised to chat at your own risk. This model is not suitable for use by minors.

Warning: This model is NOT suitable for use by minors. It will output X-rated content under certain circumstances.

This model is fine-tuned on a small-testing dataset, version 2 or a higher parameter model will contain the full dataset.

Usage Format

To effectively utilize the model, follow this structured format for engaging text-based conversations:

1. Initialization

Here is how you can define the personality of the language model:

<|system|>[Persona]

Persona: You can define a specific persona or context for the AI, but it's optional. It can be a character, a role, or just a style of interaction.

2. AI Introduction

<|user|>[User input]<|model|>

Users can start the conversation by entering their message within <|user|> and closing with <|model|>.

Example Usage:

Here's an example of how to start a conversation with the AI:

<|system|>I'm here to provide information and assistance on a wide range of topics.
<|model|>Hello! Welcome to our AI-powered assistant. How can I assist you today?
<|user|>Tell me about the history of artificial intelligence.
<|model|>

Continue the conversation as needed. This structured format helps maintain a smooth and engaging interaction with the AI.

You are not required to include User, you can change it to your prefered name or leave it blank You may also add the AI name, example:

<|user|>YourNameHere: Hello.<|model|>CharacterName:

You can also use this instruct prompt example:

<|system|>What is one plus one?<|model|>

Loading The Model

To use the model and interact with it, use the Python code below:

from transformers import (AutoModelForCausalLM,
                          AutoTokenizer,
                          pipeline,
                          )

model = AutoModelForCausalLM.from_pretrained('L-R/LLmRa-2.7B')
tokenizer = AutoTokenizer.from_pretrained('L-R/LLmRa-2.7B')

pipe = pipeline(task="text-generation", model=model, tokenizer=tokenizer, max_length=100)

input_question = 'QUESTION HERE'

question_formatted = f'<|system|>{input_question}<|model|>'

result = pipe(question_formatted)

print(f"[model]: {result[0]['generated_text'][len(question_formatted):]}")

Or the more complex one:

import os
import random
import sys
import time
import json
import torch

from transformers import (AutoTokenizer,
                          AutoModelForCausalLM,
                          BitsAndBytesConfig,
                          set_seed)

local_rank = int(os.getenv('LOCAL_RANK', '0'))
world_size = int(os.getenv('WORLD_SIZE', '1'))
local_tokenizer = bool(os.getenv('TOKENIZERS_PARALLELISM', 'false'))


class Chatbot:
    def __init__(self, config):

        self.device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
        self.tokenizer = None
        self.config = config
        self.persona = None
        self.model = None
        self.history = []

        self.load_model()

    def create_persona(self, persona_data):
        required_keys = ['name', 'description', 'greeting']
        if not all(key in persona_data for key in required_keys):
            raise ValueError(
                "Missing required keys in persona_data. Please provide 'name', 'description', and 'greeting'.")

        new_persona_id = str(max(int(key) for key in self.config["personas"].keys()) + 1)

        self.config["personas"][new_persona_id] = persona_data
        return new_persona_id

    def load_model(self):
        model_path = self.config["model_path"]
        tokenizer_path = self.config["tokenizer_path"]

        quantization_config = BitsAndBytesConfig(

            load_in_4bit= self.config['load_model_4bit'],
            bnb_4bit_quant_type='nf4' if self.config['load_model_4bit'] else None,
            bnb_4bit_compute_dtype=torch.float16 if self.config['load_model_4bit'] else None,
            bnb_4bit_use_double_quant=True if  self.config['load_model_4bit'] else None,

            load_in_8bit=self.config['load_model_8bit'],
            bnb_8bit_quant_type='nf4' if self.config['load_model_8bit'] else None,
            bnb_8bit_compute_dtype=torch.float16 if self.config['load_model_8bit'] else None,
            bnb_8bit_use_double_quant=True if self.config['load_model_8bit'] else None,

        )

        if not model_path or not tokenizer_path:
            raise ValueError('model_name or tokenizer_path name not found! Define one.')

        if self.config['load_model_4bit'] and self.config['load_model_8bit']:
            raise ValueError("You can't load the model in 8 bits and 4 bits at the same time!")

        if not self.config['user_name']:
            print('You have not selected a name! No name will be send to the model.')

        print(f"\nLoading model: {model_path}")

        if torch.cuda.is_available():

            self.model = AutoModelForCausalLM.from_pretrained(
                model_path,

                use_auth_token=self.config['model_token'],
                quantization_config=quantization_config,)

            if torch.cuda.device_count() > 1:
                self.model = torch.nn.DataParallel(self.model)
                model_running_on = f'{torch.cuda.device_count()} GPUs'
            else:
                model_running_on = '1 GPU'
        else:
            self.model = AutoModelForCausalLM.from_pretrained(
                model_path,
                quantization_config=quantization_config,
                use_auth_token=self.config['model_token']).to(
                self.device

            )
            model_running_on = 'CPU'

        print(f'Model is running on: {model_running_on}')

        self.tokenizer = AutoTokenizer.from_pretrained(tokenizer_path, use_auth_token=self.config['model_token'])
        print(self.tokenizer)


    def load_persona(self, persona_id):
        personas = self.config["personas"]
        if persona_id in personas:
            self.persona = personas[persona_id]
        else:
            raise ValueError("Invalid persona ID")


    def formatting_question(self, user_input, history):

        config_user = self.config['use_names']['user']
        config_model = self.config['use_names']['model']
        config_question = self.config['use_question_template']

        if config_question:
            formatted_answer = (
                f'<|system|>{user_input}<|model|>'
            )
        else:
            m_ = self.persona["description"]
            g_ = self.persona["greeting"]
            n_ = self.persona["name"]
            un_ = self.config["user_name"]

            if config_user and config_model:
                formatted_answer = (
                    f'<|system|>{m_}<|model|>{n_}: {g_}{history}<|user|>{un_}: {user_input}<|model|>{n_}:'
                )
            elif config_user:
                formatted_answer = (
                    f'<|system|>{m_}<|model|>{g_}{history}<|user|>{un_}: {user_input}<|model|>'
                )
            elif config_model:
                formatted_answer = (
                    f'<|system|>{m_}<|model|>{n_}: {g_}{history}<|user|>{user_input}<|model|>{n_}:'
                )
            else:
                formatted_answer = (
                    f'<|system|>{m_}<|model|>{g_}{history}<|user|>{user_input}<|model|>'
                )

        return formatted_answer

    def history_formatting(self, last_input, last_output):

        config_user = self.config['use_names']['user']
        config_model = self.config['use_names']['model']

        n_ = self.persona["name"]
        un_ = self.config["user_name"]

        if config_user and config_model:
            formatted_answer = (
                f'<|user|>{un_}: {last_input}<|model|>{n_}: {last_output}'
            )
        elif config_user:
            formatted_answer = (
                f'<|user|>{un_}: {last_input}<|model|>{last_output}'
            )
        elif config_model:
            formatted_answer = (
                f'<|user|>{last_input}<|model|>{n_}: {last_output}'
            )
        else:
            formatted_answer = (
                f'<|user|>{last_input}<|model|>{last_output}'
            )

        return formatted_answer

    def reply(self, user_input):

        config_question = self.config['use_question_template']
        set_seed(random.randint(1, 1000))
        user_input = " ".join(user_input.split())

        if len(self.history) > self.config["history_length"]:
            model_history = "\n".join([str(item) for item in self.history[-self.config["history_length"]:]])
        else:
            model_history = "\n".join([str(item) for item in self.history])

        input_ai = self.formatting_question(user_input, model_history).strip()
        tokenized_input_ai = self.tokenizer.encode(input_ai, return_tensors="pt")

        output_ids = self.model.generate(
            max_length=self.config["max_generation_length"] + len(tokenized_input_ai[0]),
            no_repeat_ngram_size=self.config["no_repeat_ngram_size"],
            repetition_penalty=self.config["repetition_penalty"],
            length_penalty=self.config["length_penalty"],
            input_ids=tokenized_input_ai.to(self.device),
            pad_token_id=self.tokenizer.eos_token_id,
            temperature=self.config["temperature"],
            top_k=self.config["top_k"],
            top_p=self.config["top_p"],
            early_stopping=True,
            use_cache=True,
            do_sample=True,
        )

        ai_reply = self.tokenizer.decode(
            output_ids[0],
            skip_special_tokens=False)[len(input_ai)+4:]

        if not config_question:
            self.history.append(self.history_formatting(user_input, ai_reply))

        return ai_reply.strip()

    def reset_conversation(self):

        self.history = []

class UserInterface:
    def __init__(self, chatbot):
        self.chatbot = chatbot

    def run(self):

        persona_id = self.chatbot.config["default_persona"]
        self.chatbot.load_persona(persona_id)

        print("\nChosen Persona:", self.chatbot.persona["name"])
        print("Your Chosen Name:", self.chatbot.config["user_name"])

        print(f'\n{self.chatbot.persona["name"]}: {self.chatbot.persona["greeting"]}')
        self.chatbot.history.append(f'{self.chatbot.persona["name"]}: {self.chatbot.persona["greeting"]}')

        while True:
            user_input = input(f"\n>> {self.chatbot.config['user_name']}: ")
            if user_input.lower() == "reset_app" or user_input == "reset_app":
                self.chatbot.reset_conversation()
                print("\nConversation history has been reset.\n")
                self.chatbot.history.append(f'{self.chatbot.persona["name"]}: {self.chatbot.persona["greeting"]}')
                print(f'{self.chatbot.persona["name"]}: {self.chatbot.persona["greeting"]}')
                continue

            if user_input.lower().startswith("create_persona"):

                # Example of use: create_persona

                # {"name": "CustomPersona",
                # "description": "This is a custom persona created by the user.",
                # "greeting": "Hello! I am CustomPersona, nice to meet you!"}

                try:
                    persona_data = json.loads(' '.join(user_input.split()[1:]))
                    new_persona_id = self.chatbot.create_persona(persona_data)
                    print(f"Persona created with ID: {new_persona_id}")
                except json.JSONDecodeError:
                    print("Invalid JSON input. Please provide a valid JSON string containing 'name', 'description', and 'greeting'.")
                except ValueError as e:
                    print(e)

            # Add a command to change the persona
            if user_input.lower().startswith("change_persona"):
                try:
                    new_persona_id = user_input.split()[1]
                    self.chatbot.load_persona(new_persona_id)
                    self.chatbot.reset_conversation()
                    print("\nPersona changed to:", self.chatbot.persona["name"])
                    print(f'\n{self.chatbot.persona["name"]}: {self.chatbot.persona["greeting"]}')
                    self.chatbot.history.append(f'{self.chatbot.persona["name"]}: {self.chatbot.persona["greeting"]}')
                    continue
                except (IndexError, ValueError):
                    print("Invalid command or persona ID. Please use 'change_persona [ID]'.")
                    continue

            if user_input.lower() == "exit_app" or user_input == "exit_app":
                print("Goodbye!")
                break

            reply = self.chatbot.reply(user_input)

            def typewriter_effect(sentence, type_delay):

                for char in sentence:
                    sys.stdout.write(char)
                    sys.stdout.flush()
                    time.sleep(type_delay)

            reply_length = len(reply)
            type_delay_ranges = {
                (100, 200): 0.03,
                (200, 300): 0.02,
                (300, 400): 0.01,
                (400, 500): 0.005
            }

            default_type_delay = 0.04

            for length_range, delay in type_delay_ranges.items():
                if length_range[0] < reply_length <= length_range[1]:
                    type_delay = delay
                    break
            else:
                type_delay = default_type_delay

            if self.chatbot.config['use_typing_effect']:
                typewriter_effect(f'{self.chatbot.persona["name"]}: {reply}', type_delay)
            else:
                print(f'{self.chatbot.persona["name"]}: {reply}')

def main():
    
    config = {
        "user_name": "Jack",  # The user's name, which is set to "Jack" in this case.

        "model_path": "L-R/LLmRa-2.7B",  # Path to the model used for generating responses.
        "tokenizer_path": "L-R/LLmRa-2.7B",  # Path to the tokenizer associated with the model.
        "model_token": None,  # If you want to load the model using your huggingface token. (Not required, but included)

        "load_model_4bit": True,  # Whether to load the model with 4-bit precision.
        "load_model_8bit": False,  # Whether to load the model with 8-bit precision.

        "use_typing_effect": True,  # Whether to simulate a typing effect when displaying responses.

        "use_names": {
            "model": False,  # Whether the model's name should be used in question formatting.
            "user": False,  # Whether the user's name should be used in question formatting.
        },

        "use_question_template": False,  # Whether to use predefined question templates in conversations.

        "personas": {
            # A dictionary of personas with their descriptions and greetings for use in conversations.
            "1": {
                "name": "LLmRa",
                "description": "Description of the LLmRa persona. It provides background and characteristics of the persona.",
                "greeting": "The greeting message when the LLmRa persona is active in a conversation."
            },
            "2": {
                "name": "Hikari",
                "description": "Description of the Hikari persona. It provides background and characteristics of the persona.",
                "greeting": "The greeting message when the Hikari persona is active in a conversation."
            }
        },

        "max_generation_length": 450,  # The maximum length for generated responses.

        "default_persona": "1",  # The default persona to use when starting a conversation.

        "history_length": 6,  # The maximum number of previous messages to consider in the conversation history.

        "top_k": 40,  # Top-k sampling parameter for text generation.
        "top_p": .55,  # Top-p sampling parameter for text generation.
        "temperature": .55,  # Temperature parameter for controlling the randomness of generated text.
        "length_penalty": 0.65,  # Penalty factor for generating longer or shorter responses.
        "no_repeat_ngram_size": 4,  # Parameter to avoid repeating n-grams in generated text.
        "repetition_penalty": 1.25,  # Penalty factor for avoiding repeated phrases in generated text.
    }

    # Initialize chatbot and user interface
    chatbot = Chatbot(config)
    ui = UserInterface(chatbot)

    # Run the user interface
    ui.run()


if __name__ == "__main__":
    main()

Known issues

Model doesn't some of the times follow instructions.

Open LLM Leaderboard Evaluation Results

Detailed results can be found here

Metric	Value
Avg.	32.16
ARC (25-shot)	37.03
HellaSwag (10-shot)	60.65
MMLU (5-shot)	25.58
TruthfulQA (0-shot)	35.23
Winogrande (5-shot)	61.56
GSM8K (5-shot)	0.3
DROP (3-shot)	4.76