--- license: other language: - en pipeline_tag: conversational inference: false tags: - AI - ConversationalAI ---

LLmRa-2.7B

A conversational Open Pre-trained Transformer Language Model fine-tune.

**LLmRa 2.7B**, as a proof-of-concept fine-tune of [facebook/opt-2.7b](https://huggingface.co/facebook/opt-2.7b) optimized for dialogue. **Disclaimer:** NSFW data was included in the fine-tuning of this model. Although SFW inputs will usually result in SFW outputs, you are advised to **chat at your own risk. This model is not suitable for use by minors.** **Warning:** This model is **NOT** suitable for use by minors. **It will output X-rated content under certain circumstances.** **This model is fine-tuned on a small-testing dataset, version 2 or a higher parameter model will contain the full dataset.** --- ## Usage Format To effectively utilize the model, follow this structured format for engaging text-based conversations: **1. Initialization** Here is how you can define the personality of the language model: ``` <|system|>[Persona] ``` - **Persona**: You can define a specific persona or context for the AI, but it's optional. It can be a character, a role, or just a style of interaction. **2. AI Introduction** ``` <|user|>[User input]<|model|> ``` - Users can start the conversation by entering their message within `<|user|>` and closing with `<|model|>`. --- ### Example Usage: Here's an example of how to start a conversation with the AI: ``` <|system|>I'm here to provide information and assistance on a wide range of topics. <|model|>Hello! Welcome to our AI-powered assistant. How can I assist you today? <|user|>Tell me about the history of artificial intelligence. <|model|> ``` Continue the conversation as needed. This structured format helps maintain a smooth and engaging interaction with the AI. You are not required to include `User`, you can change it to your prefered name or leave it blank You may also add the AI name, example: ``` <|user|>YourNameHere: Hello.<|model|>CharacterName: ``` You can also use this instruct prompt example: ``` <|system|>What is one plus one?<|model|> ``` ## Loading The Model To use the model and interact with it, use the Python code below: ```Python from transformers import (AutoModelForCausalLM, AutoTokenizer, pipeline, ) model = AutoModelForCausalLM.from_pretrained('L-R/LLmRa-2.7B') tokenizer = AutoTokenizer.from_pretrained('L-R/LLmRa-2.7B') pipe = pipeline(task="text-generation", model=model, tokenizer=tokenizer, max_length=100) input_question = 'QUESTION HERE' question_formatted = f'<|system|>{input_question}<|model|>' result = pipe(question_formatted) print(f"[model]: {result[0]['generated_text'][len(question_formatted):]}") ``` Or the more complex one: ```Python import os import random import sys import time import json import torch from transformers import (AutoTokenizer, AutoModelForCausalLM, BitsAndBytesConfig, set_seed) local_rank = int(os.getenv('LOCAL_RANK', '0')) world_size = int(os.getenv('WORLD_SIZE', '1')) local_tokenizer = bool(os.getenv('TOKENIZERS_PARALLELISM', 'false')) class Chatbot: def __init__(self, config): self.device = torch.device("cuda" if torch.cuda.is_available() else "cpu") self.tokenizer = None self.config = config self.persona = None self.model = None self.history = [] self.load_model() def create_persona(self, persona_data): required_keys = ['name', 'description', 'greeting'] if not all(key in persona_data for key in required_keys): raise ValueError( "Missing required keys in persona_data. Please provide 'name', 'description', and 'greeting'.") new_persona_id = str(max(int(key) for key in self.config["personas"].keys()) + 1) self.config["personas"][new_persona_id] = persona_data return new_persona_id def load_model(self): model_path = self.config["model_path"] tokenizer_path = self.config["tokenizer_path"] quantization_config = BitsAndBytesConfig( load_in_4bit= self.config['load_model_4bit'], bnb_4bit_quant_type='nf4' if self.config['load_model_4bit'] else None, bnb_4bit_compute_dtype=torch.float16 if self.config['load_model_4bit'] else None, bnb_4bit_use_double_quant=True if self.config['load_model_4bit'] else None, load_in_8bit=self.config['load_model_8bit'], bnb_8bit_quant_type='nf4' if self.config['load_model_8bit'] else None, bnb_8bit_compute_dtype=torch.float16 if self.config['load_model_8bit'] else None, bnb_8bit_use_double_quant=True if self.config['load_model_8bit'] else None, ) if not model_path or not tokenizer_path: raise ValueError('model_name or tokenizer_path name not found! Define one.') if self.config['load_model_4bit'] and self.config['load_model_8bit']: raise ValueError("You can't load the model in 8 bits and 4 bits at the same time!") if not self.config['user_name']: print('You have not selected a name! No name will be send to the model.') print(f"\nLoading model: {model_path}") if torch.cuda.is_available(): self.model = AutoModelForCausalLM.from_pretrained( model_path, use_auth_token=self.config['model_token'], quantization_config=quantization_config,) if torch.cuda.device_count() > 1: self.model = torch.nn.DataParallel(self.model) model_running_on = f'{torch.cuda.device_count()} GPUs' else: model_running_on = '1 GPU' else: self.model = AutoModelForCausalLM.from_pretrained( model_path, quantization_config=quantization_config, use_auth_token=self.config['model_token']).to( self.device ) model_running_on = 'CPU' print(f'Model is running on: {model_running_on}') self.tokenizer = AutoTokenizer.from_pretrained(tokenizer_path, use_auth_token=self.config['model_token']) print(self.tokenizer) def load_persona(self, persona_id): personas = self.config["personas"] if persona_id in personas: self.persona = personas[persona_id] else: raise ValueError("Invalid persona ID") def formatting_question(self, user_input, history): config_user = self.config['use_names']['user'] config_model = self.config['use_names']['model'] config_question = self.config['use_question_template'] if config_question: formatted_answer = ( f'<|system|>{user_input}<|model|>' ) else: m_ = self.persona["description"] g_ = self.persona["greeting"] n_ = self.persona["name"] un_ = self.config["user_name"] if config_user and config_model: formatted_answer = ( f'<|system|>{m_}<|model|>{n_}: {g_}{history}<|user|>{un_}: {user_input}<|model|>{n_}:' ) elif config_user: formatted_answer = ( f'<|system|>{m_}<|model|>{g_}{history}<|user|>{un_}: {user_input}<|model|>' ) elif config_model: formatted_answer = ( f'<|system|>{m_}<|model|>{n_}: {g_}{history}<|user|>{user_input}<|model|>{n_}:' ) else: formatted_answer = ( f'<|system|>{m_}<|model|>{g_}{history}<|user|>{user_input}<|model|>' ) return formatted_answer def history_formatting(self, last_input, last_output): config_user = self.config['use_names']['user'] config_model = self.config['use_names']['model'] n_ = self.persona["name"] un_ = self.config["user_name"] if config_user and config_model: formatted_answer = ( f'<|user|>{un_}: {last_input}<|model|>{n_}: {last_output}' ) elif config_user: formatted_answer = ( f'<|user|>{un_}: {last_input}<|model|>{last_output}' ) elif config_model: formatted_answer = ( f'<|user|>{last_input}<|model|>{n_}: {last_output}' ) else: formatted_answer = ( f'<|user|>{last_input}<|model|>{last_output}' ) return formatted_answer def reply(self, user_input): config_question = self.config['use_question_template'] set_seed(random.randint(1, 1000)) user_input = " ".join(user_input.split()) if len(self.history) > self.config["history_length"]: model_history = "\n".join([str(item) for item in self.history[-self.config["history_length"]:]]) else: model_history = "\n".join([str(item) for item in self.history]) input_ai = self.formatting_question(user_input, model_history).strip() tokenized_input_ai = self.tokenizer.encode(input_ai, return_tensors="pt") output_ids = self.model.generate( max_length=self.config["max_generation_length"] + len(tokenized_input_ai[0]), no_repeat_ngram_size=self.config["no_repeat_ngram_size"], repetition_penalty=self.config["repetition_penalty"], length_penalty=self.config["length_penalty"], input_ids=tokenized_input_ai.to(self.device), pad_token_id=self.tokenizer.eos_token_id, temperature=self.config["temperature"], top_k=self.config["top_k"], top_p=self.config["top_p"], early_stopping=True, use_cache=True, do_sample=True, ) ai_reply = self.tokenizer.decode( output_ids[0], skip_special_tokens=False)[len(input_ai)+4:] if not config_question: self.history.append(self.history_formatting(user_input, ai_reply)) return ai_reply.strip() def reset_conversation(self): self.history = [] class UserInterface: def __init__(self, chatbot): self.chatbot = chatbot def run(self): persona_id = self.chatbot.config["default_persona"] self.chatbot.load_persona(persona_id) print("\nChosen Persona:", self.chatbot.persona["name"]) print("Your Chosen Name:", self.chatbot.config["user_name"]) print(f'\n{self.chatbot.persona["name"]}: {self.chatbot.persona["greeting"]}') self.chatbot.history.append(f'{self.chatbot.persona["name"]}: {self.chatbot.persona["greeting"]}') while True: user_input = input(f"\n>> {self.chatbot.config['user_name']}: ") if user_input.lower() == "reset_app" or user_input == "reset_app": self.chatbot.reset_conversation() print("\nConversation history has been reset.\n") self.chatbot.history.append(f'{self.chatbot.persona["name"]}: {self.chatbot.persona["greeting"]}') print(f'{self.chatbot.persona["name"]}: {self.chatbot.persona["greeting"]}') continue if user_input.lower().startswith("create_persona"): # Example of use: create_persona # {"name": "CustomPersona", # "description": "This is a custom persona created by the user.", # "greeting": "Hello! I am CustomPersona, nice to meet you!"} try: persona_data = json.loads(' '.join(user_input.split()[1:])) new_persona_id = self.chatbot.create_persona(persona_data) print(f"Persona created with ID: {new_persona_id}") except json.JSONDecodeError: print("Invalid JSON input. Please provide a valid JSON string containing 'name', 'description', and 'greeting'.") except ValueError as e: print(e) # Add a command to change the persona if user_input.lower().startswith("change_persona"): try: new_persona_id = user_input.split()[1] self.chatbot.load_persona(new_persona_id) self.chatbot.reset_conversation() print("\nPersona changed to:", self.chatbot.persona["name"]) print(f'\n{self.chatbot.persona["name"]}: {self.chatbot.persona["greeting"]}') self.chatbot.history.append(f'{self.chatbot.persona["name"]}: {self.chatbot.persona["greeting"]}') continue except (IndexError, ValueError): print("Invalid command or persona ID. Please use 'change_persona [ID]'.") continue if user_input.lower() == "exit_app" or user_input == "exit_app": print("Goodbye!") break reply = self.chatbot.reply(user_input) def typewriter_effect(sentence, type_delay): for char in sentence: sys.stdout.write(char) sys.stdout.flush() time.sleep(type_delay) reply_length = len(reply) type_delay_ranges = { (100, 200): 0.03, (200, 300): 0.02, (300, 400): 0.01, (400, 500): 0.005 } default_type_delay = 0.04 for length_range, delay in type_delay_ranges.items(): if length_range[0] < reply_length <= length_range[1]: type_delay = delay break else: type_delay = default_type_delay if self.chatbot.config['use_typing_effect']: typewriter_effect(f'{self.chatbot.persona["name"]}: {reply}', type_delay) else: print(f'{self.chatbot.persona["name"]}: {reply}') def main(): config = { "user_name": "Jack", # The user's name, which is set to "Jack" in this case. "model_path": "L-R/LLmRa-2.7B", # Path to the model used for generating responses. "tokenizer_path": "L-R/LLmRa-2.7B", # Path to the tokenizer associated with the model. "model_token": None, # If you want to load the model using your huggingface token. (Not required, but included) "load_model_4bit": True, # Whether to load the model with 4-bit precision. "load_model_8bit": False, # Whether to load the model with 8-bit precision. "use_typing_effect": True, # Whether to simulate a typing effect when displaying responses. "use_names": { "model": False, # Whether the model's name should be used in question formatting. "user": False, # Whether the user's name should be used in question formatting. }, "use_question_template": False, # Whether to use predefined question templates in conversations. "personas": { # A dictionary of personas with their descriptions and greetings for use in conversations. "1": { "name": "LLmRa", "description": "Description of the LLmRa persona. It provides background and characteristics of the persona.", "greeting": "The greeting message when the LLmRa persona is active in a conversation." }, "2": { "name": "Hikari", "description": "Description of the Hikari persona. It provides background and characteristics of the persona.", "greeting": "The greeting message when the Hikari persona is active in a conversation." } }, "max_generation_length": 450, # The maximum length for generated responses. "default_persona": "1", # The default persona to use when starting a conversation. "history_length": 6, # The maximum number of previous messages to consider in the conversation history. "top_k": 40, # Top-k sampling parameter for text generation. "top_p": .55, # Top-p sampling parameter for text generation. "temperature": .55, # Temperature parameter for controlling the randomness of generated text. "length_penalty": 0.65, # Penalty factor for generating longer or shorter responses. "no_repeat_ngram_size": 4, # Parameter to avoid repeating n-grams in generated text. "repetition_penalty": 1.25, # Penalty factor for avoiding repeated phrases in generated text. } # Initialize chatbot and user interface chatbot = Chatbot(config) ui = UserInterface(chatbot) # Run the user interface ui.run() if __name__ == "__main__": main() ``` ## Known issues Model doesn't some of the times follow instructions. # [Open LLM Leaderboard Evaluation Results](https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard) Detailed results can be found [here](https://huggingface.co/datasets/open-llm-leaderboard/details_L-R__LLmRa-2.7B) | Metric | Value | |-----------------------|---------------------------| | Avg. | 32.16 | | ARC (25-shot) | 37.03 | | HellaSwag (10-shot) | 60.65 | | MMLU (5-shot) | 25.58 | | TruthfulQA (0-shot) | 35.23 | | Winogrande (5-shot) | 61.56 | | GSM8K (5-shot) | 0.3 | | DROP (3-shot) | 4.76 |