|
--- |
|
license: other |
|
language: |
|
- en |
|
pipeline_tag: conversational |
|
inference: false |
|
tags: |
|
- AI |
|
- ConversationalAI |
|
--- |
|
|
|
<h1 style="text-align: center">LLmRa-2.7B</h1> |
|
<h2 style="text-align: center">A conversational Open Pre-trained Transformer Language Model fine-tune.</h2> |
|
|
|
**LLmRa 2.7B**, as a proof-of-concept fine-tune of [facebook/opt-2.7b](https://huggingface.co/facebook/opt-2.7b) optimized for dialogue. |
|
|
|
**Disclaimer:** NSFW data was included in the fine-tuning of this model. Although SFW inputs will usually result in SFW outputs, you are advised to **chat at your own risk. This model is not suitable for use by minors.** |
|
|
|
**Warning:** This model is **NOT** suitable for use by minors. **It will output X-rated content under certain circumstances.** |
|
|
|
**This model is fine-tuned on a small-testing dataset, version 2 or a higher parameter model will contain the full dataset.** |
|
|
|
--- |
|
|
|
## Usage Format |
|
|
|
To effectively utilize the model, follow this structured format for engaging text-based conversations: |
|
|
|
**1. Initialization** |
|
|
|
Here is how you can define the personality of the language model: |
|
|
|
``` |
|
<|system|>[Persona] |
|
``` |
|
|
|
- **Persona**: You can define a specific persona or context for the AI, but it's optional. It can be a character, a role, or just a style of interaction. |
|
|
|
**2. AI Introduction** |
|
|
|
``` |
|
<|user|>[User input]<|model|> |
|
``` |
|
- Users can start the conversation by entering their message within `<|user|>` and closing with `<|model|>`. |
|
|
|
--- |
|
|
|
### Example Usage: |
|
|
|
Here's an example of how to start a conversation with the AI: |
|
|
|
``` |
|
<|system|>I'm here to provide information and assistance on a wide range of topics. |
|
<|model|>Hello! Welcome to our AI-powered assistant. How can I assist you today? |
|
<|user|>Tell me about the history of artificial intelligence. |
|
<|model|> |
|
``` |
|
|
|
Continue the conversation as needed. This structured format helps maintain a smooth and engaging interaction with the AI. |
|
|
|
You are not required to include `User`, you can change it to your prefered name or leave it blank You may also add the AI name, example: |
|
|
|
``` |
|
<|user|>YourNameHere: Hello.<|model|>CharacterName: |
|
``` |
|
|
|
You can also use this instruct prompt example: |
|
|
|
``` |
|
<|system|>What is one plus one?<|model|> |
|
``` |
|
|
|
## Loading The Model |
|
|
|
To use the model and interact with it, use the Python code below: |
|
|
|
```Python |
|
from transformers import (AutoModelForCausalLM, |
|
AutoTokenizer, |
|
pipeline, |
|
) |
|
|
|
model = AutoModelForCausalLM.from_pretrained('L-R/LLmRa-2.7B') |
|
tokenizer = AutoTokenizer.from_pretrained('L-R/LLmRa-2.7B') |
|
|
|
pipe = pipeline(task="text-generation", model=model, tokenizer=tokenizer, max_length=100) |
|
|
|
input_question = 'QUESTION HERE' |
|
|
|
question_formatted = f'<|system|>{input_question}<|model|>' |
|
|
|
result = pipe(question_formatted) |
|
|
|
print(f"[model]: {result[0]['generated_text'][len(question_formatted):]}") |
|
``` |
|
|
|
Or the more complex one: |
|
|
|
```Python |
|
import os |
|
import random |
|
import sys |
|
import time |
|
import json |
|
import torch |
|
|
|
from transformers import (AutoTokenizer, |
|
AutoModelForCausalLM, |
|
BitsAndBytesConfig, |
|
set_seed) |
|
|
|
local_rank = int(os.getenv('LOCAL_RANK', '0')) |
|
world_size = int(os.getenv('WORLD_SIZE', '1')) |
|
local_tokenizer = bool(os.getenv('TOKENIZERS_PARALLELISM', 'false')) |
|
|
|
|
|
class Chatbot: |
|
def __init__(self, config): |
|
|
|
self.device = torch.device("cuda" if torch.cuda.is_available() else "cpu") |
|
self.tokenizer = None |
|
self.config = config |
|
self.persona = None |
|
self.model = None |
|
self.history = [] |
|
|
|
self.load_model() |
|
|
|
def create_persona(self, persona_data): |
|
required_keys = ['name', 'description', 'greeting'] |
|
if not all(key in persona_data for key in required_keys): |
|
raise ValueError( |
|
"Missing required keys in persona_data. Please provide 'name', 'description', and 'greeting'.") |
|
|
|
new_persona_id = str(max(int(key) for key in self.config["personas"].keys()) + 1) |
|
|
|
self.config["personas"][new_persona_id] = persona_data |
|
return new_persona_id |
|
|
|
def load_model(self): |
|
model_path = self.config["model_path"] |
|
tokenizer_path = self.config["tokenizer_path"] |
|
|
|
quantization_config = BitsAndBytesConfig( |
|
|
|
load_in_4bit= self.config['load_model_4bit'], |
|
bnb_4bit_quant_type='nf4' if self.config['load_model_4bit'] else None, |
|
bnb_4bit_compute_dtype=torch.float16 if self.config['load_model_4bit'] else None, |
|
bnb_4bit_use_double_quant=True if self.config['load_model_4bit'] else None, |
|
|
|
load_in_8bit=self.config['load_model_8bit'], |
|
bnb_8bit_quant_type='nf4' if self.config['load_model_8bit'] else None, |
|
bnb_8bit_compute_dtype=torch.float16 if self.config['load_model_8bit'] else None, |
|
bnb_8bit_use_double_quant=True if self.config['load_model_8bit'] else None, |
|
|
|
) |
|
|
|
if not model_path or not tokenizer_path: |
|
raise ValueError('model_name or tokenizer_path name not found! Define one.') |
|
|
|
if self.config['load_model_4bit'] and self.config['load_model_8bit']: |
|
raise ValueError("You can't load the model in 8 bits and 4 bits at the same time!") |
|
|
|
if not self.config['user_name']: |
|
print('You have not selected a name! No name will be send to the model.') |
|
|
|
print(f"\nLoading model: {model_path}") |
|
|
|
if torch.cuda.is_available(): |
|
|
|
self.model = AutoModelForCausalLM.from_pretrained( |
|
model_path, |
|
|
|
use_auth_token=self.config['model_token'], |
|
quantization_config=quantization_config,) |
|
|
|
if torch.cuda.device_count() > 1: |
|
self.model = torch.nn.DataParallel(self.model) |
|
model_running_on = f'{torch.cuda.device_count()} GPUs' |
|
else: |
|
model_running_on = '1 GPU' |
|
else: |
|
self.model = AutoModelForCausalLM.from_pretrained( |
|
model_path, |
|
quantization_config=quantization_config, |
|
use_auth_token=self.config['model_token']).to( |
|
self.device |
|
|
|
) |
|
model_running_on = 'CPU' |
|
|
|
print(f'Model is running on: {model_running_on}') |
|
|
|
self.tokenizer = AutoTokenizer.from_pretrained(tokenizer_path, use_auth_token=self.config['model_token']) |
|
print(self.tokenizer) |
|
|
|
|
|
def load_persona(self, persona_id): |
|
personas = self.config["personas"] |
|
if persona_id in personas: |
|
self.persona = personas[persona_id] |
|
else: |
|
raise ValueError("Invalid persona ID") |
|
|
|
|
|
def formatting_question(self, user_input, history): |
|
|
|
config_user = self.config['use_names']['user'] |
|
config_model = self.config['use_names']['model'] |
|
config_question = self.config['use_question_template'] |
|
|
|
if config_question: |
|
formatted_answer = ( |
|
f'<|system|>{user_input}<|model|>' |
|
) |
|
else: |
|
m_ = self.persona["description"] |
|
g_ = self.persona["greeting"] |
|
n_ = self.persona["name"] |
|
un_ = self.config["user_name"] |
|
|
|
if config_user and config_model: |
|
formatted_answer = ( |
|
f'<|system|>{m_}<|model|>{n_}: {g_}{history}<|user|>{un_}: {user_input}<|model|>{n_}:' |
|
) |
|
elif config_user: |
|
formatted_answer = ( |
|
f'<|system|>{m_}<|model|>{g_}{history}<|user|>{un_}: {user_input}<|model|>' |
|
) |
|
elif config_model: |
|
formatted_answer = ( |
|
f'<|system|>{m_}<|model|>{n_}: {g_}{history}<|user|>{user_input}<|model|>{n_}:' |
|
) |
|
else: |
|
formatted_answer = ( |
|
f'<|system|>{m_}<|model|>{g_}{history}<|user|>{user_input}<|model|>' |
|
) |
|
|
|
return formatted_answer |
|
|
|
def history_formatting(self, last_input, last_output): |
|
|
|
config_user = self.config['use_names']['user'] |
|
config_model = self.config['use_names']['model'] |
|
|
|
n_ = self.persona["name"] |
|
un_ = self.config["user_name"] |
|
|
|
if config_user and config_model: |
|
formatted_answer = ( |
|
f'<|user|>{un_}: {last_input}<|model|>{n_}: {last_output}' |
|
) |
|
elif config_user: |
|
formatted_answer = ( |
|
f'<|user|>{un_}: {last_input}<|model|>{last_output}' |
|
) |
|
elif config_model: |
|
formatted_answer = ( |
|
f'<|user|>{last_input}<|model|>{n_}: {last_output}' |
|
) |
|
else: |
|
formatted_answer = ( |
|
f'<|user|>{last_input}<|model|>{last_output}' |
|
) |
|
|
|
return formatted_answer |
|
|
|
def reply(self, user_input): |
|
|
|
config_question = self.config['use_question_template'] |
|
set_seed(random.randint(1, 1000)) |
|
user_input = " ".join(user_input.split()) |
|
|
|
if len(self.history) > self.config["history_length"]: |
|
model_history = "\n".join([str(item) for item in self.history[-self.config["history_length"]:]]) |
|
else: |
|
model_history = "\n".join([str(item) for item in self.history]) |
|
|
|
input_ai = self.formatting_question(user_input, model_history).strip() |
|
tokenized_input_ai = self.tokenizer.encode(input_ai, return_tensors="pt") |
|
|
|
output_ids = self.model.generate( |
|
max_length=self.config["max_generation_length"] + len(tokenized_input_ai[0]), |
|
no_repeat_ngram_size=self.config["no_repeat_ngram_size"], |
|
repetition_penalty=self.config["repetition_penalty"], |
|
length_penalty=self.config["length_penalty"], |
|
input_ids=tokenized_input_ai.to(self.device), |
|
pad_token_id=self.tokenizer.eos_token_id, |
|
temperature=self.config["temperature"], |
|
top_k=self.config["top_k"], |
|
top_p=self.config["top_p"], |
|
early_stopping=True, |
|
use_cache=True, |
|
do_sample=True, |
|
) |
|
|
|
ai_reply = self.tokenizer.decode( |
|
output_ids[0], |
|
skip_special_tokens=False)[len(input_ai)+4:] |
|
|
|
if not config_question: |
|
self.history.append(self.history_formatting(user_input, ai_reply)) |
|
|
|
return ai_reply.strip() |
|
|
|
def reset_conversation(self): |
|
|
|
self.history = [] |
|
|
|
class UserInterface: |
|
def __init__(self, chatbot): |
|
self.chatbot = chatbot |
|
|
|
def run(self): |
|
|
|
persona_id = self.chatbot.config["default_persona"] |
|
self.chatbot.load_persona(persona_id) |
|
|
|
print("\nChosen Persona:", self.chatbot.persona["name"]) |
|
print("Your Chosen Name:", self.chatbot.config["user_name"]) |
|
|
|
print(f'\n{self.chatbot.persona["name"]}: {self.chatbot.persona["greeting"]}') |
|
self.chatbot.history.append(f'{self.chatbot.persona["name"]}: {self.chatbot.persona["greeting"]}') |
|
|
|
while True: |
|
user_input = input(f"\n>> {self.chatbot.config['user_name']}: ") |
|
if user_input.lower() == "reset_app" or user_input == "reset_app": |
|
self.chatbot.reset_conversation() |
|
print("\nConversation history has been reset.\n") |
|
self.chatbot.history.append(f'{self.chatbot.persona["name"]}: {self.chatbot.persona["greeting"]}') |
|
print(f'{self.chatbot.persona["name"]}: {self.chatbot.persona["greeting"]}') |
|
continue |
|
|
|
if user_input.lower().startswith("create_persona"): |
|
|
|
# Example of use: create_persona |
|
|
|
# {"name": "CustomPersona", |
|
# "description": "This is a custom persona created by the user.", |
|
# "greeting": "Hello! I am CustomPersona, nice to meet you!"} |
|
|
|
try: |
|
persona_data = json.loads(' '.join(user_input.split()[1:])) |
|
new_persona_id = self.chatbot.create_persona(persona_data) |
|
print(f"Persona created with ID: {new_persona_id}") |
|
except json.JSONDecodeError: |
|
print("Invalid JSON input. Please provide a valid JSON string containing 'name', 'description', and 'greeting'.") |
|
except ValueError as e: |
|
print(e) |
|
|
|
# Add a command to change the persona |
|
if user_input.lower().startswith("change_persona"): |
|
try: |
|
new_persona_id = user_input.split()[1] |
|
self.chatbot.load_persona(new_persona_id) |
|
self.chatbot.reset_conversation() |
|
print("\nPersona changed to:", self.chatbot.persona["name"]) |
|
print(f'\n{self.chatbot.persona["name"]}: {self.chatbot.persona["greeting"]}') |
|
self.chatbot.history.append(f'{self.chatbot.persona["name"]}: {self.chatbot.persona["greeting"]}') |
|
continue |
|
except (IndexError, ValueError): |
|
print("Invalid command or persona ID. Please use 'change_persona [ID]'.") |
|
continue |
|
|
|
if user_input.lower() == "exit_app" or user_input == "exit_app": |
|
print("Goodbye!") |
|
break |
|
|
|
reply = self.chatbot.reply(user_input) |
|
|
|
def typewriter_effect(sentence, type_delay): |
|
|
|
for char in sentence: |
|
sys.stdout.write(char) |
|
sys.stdout.flush() |
|
time.sleep(type_delay) |
|
|
|
reply_length = len(reply) |
|
type_delay_ranges = { |
|
(100, 200): 0.03, |
|
(200, 300): 0.02, |
|
(300, 400): 0.01, |
|
(400, 500): 0.005 |
|
} |
|
|
|
default_type_delay = 0.04 |
|
|
|
for length_range, delay in type_delay_ranges.items(): |
|
if length_range[0] < reply_length <= length_range[1]: |
|
type_delay = delay |
|
break |
|
else: |
|
type_delay = default_type_delay |
|
|
|
if self.chatbot.config['use_typing_effect']: |
|
typewriter_effect(f'{self.chatbot.persona["name"]}: {reply}', type_delay) |
|
else: |
|
print(f'{self.chatbot.persona["name"]}: {reply}') |
|
|
|
def main(): |
|
|
|
config = { |
|
"user_name": "Jack", # The user's name, which is set to "Jack" in this case. |
|
|
|
"model_path": "L-R/LLmRa-2.7B", # Path to the model used for generating responses. |
|
"tokenizer_path": "L-R/LLmRa-2.7B", # Path to the tokenizer associated with the model. |
|
"model_token": None, # If you want to load the model using your huggingface token. (Not required, but included) |
|
|
|
"load_model_4bit": True, # Whether to load the model with 4-bit precision. |
|
"load_model_8bit": False, # Whether to load the model with 8-bit precision. |
|
|
|
"use_typing_effect": True, # Whether to simulate a typing effect when displaying responses. |
|
|
|
"use_names": { |
|
"model": False, # Whether the model's name should be used in question formatting. |
|
"user": False, # Whether the user's name should be used in question formatting. |
|
}, |
|
|
|
"use_question_template": False, # Whether to use predefined question templates in conversations. |
|
|
|
"personas": { |
|
# A dictionary of personas with their descriptions and greetings for use in conversations. |
|
"1": { |
|
"name": "LLmRa", |
|
"description": "Description of the LLmRa persona. It provides background and characteristics of the persona.", |
|
"greeting": "The greeting message when the LLmRa persona is active in a conversation." |
|
}, |
|
"2": { |
|
"name": "Hikari", |
|
"description": "Description of the Hikari persona. It provides background and characteristics of the persona.", |
|
"greeting": "The greeting message when the Hikari persona is active in a conversation." |
|
} |
|
}, |
|
|
|
"max_generation_length": 450, # The maximum length for generated responses. |
|
|
|
"default_persona": "1", # The default persona to use when starting a conversation. |
|
|
|
"history_length": 6, # The maximum number of previous messages to consider in the conversation history. |
|
|
|
"top_k": 40, # Top-k sampling parameter for text generation. |
|
"top_p": .55, # Top-p sampling parameter for text generation. |
|
"temperature": .55, # Temperature parameter for controlling the randomness of generated text. |
|
"length_penalty": 0.65, # Penalty factor for generating longer or shorter responses. |
|
"no_repeat_ngram_size": 4, # Parameter to avoid repeating n-grams in generated text. |
|
"repetition_penalty": 1.25, # Penalty factor for avoiding repeated phrases in generated text. |
|
} |
|
|
|
# Initialize chatbot and user interface |
|
chatbot = Chatbot(config) |
|
ui = UserInterface(chatbot) |
|
|
|
# Run the user interface |
|
ui.run() |
|
|
|
|
|
if __name__ == "__main__": |
|
main() |
|
``` |
|
|
|
## Known issues |
|
|
|
Model doesn't some of the times follow instructions. |
|
# [Open LLM Leaderboard Evaluation Results](https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard) |
|
Detailed results can be found [here](https://huggingface.co/datasets/open-llm-leaderboard/details_L-R__LLmRa-2.7B) |
|
|
|
| Metric | Value | |
|
|-----------------------|---------------------------| |
|
| Avg. | 32.16 | |
|
| ARC (25-shot) | 37.03 | |
|
| HellaSwag (10-shot) | 60.65 | |
|
| MMLU (5-shot) | 25.58 | |
|
| TruthfulQA (0-shot) | 35.23 | |
|
| Winogrande (5-shot) | 61.56 | |
|
| GSM8K (5-shot) | 0.3 | |
|
| DROP (3-shot) | 4.76 | |
|
|