--- license: apache-2.0 language: - en - ja tags: - finetuned library_name: transformers pipeline_tag: text-generation --- # Our Models - [Vecteus](https://huggingface.co/Local-Novel-LLM-project/Vecteus-v1) - [Ninja-v1](https://huggingface.co/Local-Novel-LLM-project/Ninja-v1) - [Ninja-v1-NSFW](https://huggingface.co/Local-Novel-LLM-project/Ninja-v1-NSFW) - [Ninja-v1-128k](https://huggingface.co/Local-Novel-LLM-project/Ninja-v1-128k) - [Ninja-v1-NSFW-128k](https://huggingface.co/Local-Novel-LLM-project/Ninja-v1-NSFW-128k) ## Model Card for Ninja-v1.0 The Mistral-7B--based Large Language Model (LLM) is an noveldataset fine-tuned version of the Mistral-7B-v0.1 Ninja has the following changes compared to Mistral-7B-v0.1. - Achieving both high quality Japanese and English generation - Memory ability that does not forget even after long-context generation This model was created with the help of GPUs from the first LocalAI hackathon. We would like to take this opportunity to thank ## List of Creation Methods - Chatvector for multiple models - Simple linear merging of result models - Domain and Sentence Enhancement with LORA - Context expansion ## Instruction format Ninja adopts the prompt format from Vicuna and supports multi-turn conversation. The prompt should be as following: ``` USER: Hi ASSISTANT: Hello. USER: Who are you? ASSISTANT: I am ninja. ``` ## Example prompts to improve (Japanese) - BAD: あなたは○○として振る舞います - GOOD: あなたは○○です - BAD: あなたは○○ができます - GOOD: あなたは○○をします ## Performing inference ```python from transformers import AutoModelForCausalLM, AutoTokenizer import torch model_id = "Local-Novel-LLM-project/Ninja-v1" new_tokens = 1024 model = AutoModelForCausalLM.from_pretrained(model_id, trust_remote_code=True, torch_dtype=torch.float16, attn_implementation="flash_attention_2", device_map="auto") tokenizer = AutoTokenizer.from_pretrained(model_id) system_prompt = "あなたはプロの小説家です。\n小説を書いてください\n-------- " prompt = input("Enter a prompt: ") system_prompt += prompt + "\n-------- " model_inputs = tokenizer([system_prompt], return_tensors="pt").to("cuda") generated_ids = model.generate(**model_inputs, max_new_tokens=new_tokens, do_sample=True) print(tokenizer.batch_decode(generated_ids)[0]) ```` ## Merge recipe - WizardLM2 - mistralai/Mistral-7B-v0.1 - Elizezen/Antler-7B - stabilityai/japanese-stablelm-instruct-gamma-7b - NTQAI/chatntq-ja-7b-v1.0 The characteristics of each model are as follows. - WizardLM2: High quality multitasking model - Antler-7B: Model specialized for novel writing - NTQAI/chatntq-ja-7b-v1.0 High quality Japanese specialized model ## Other points to keep in mind - The training data may be biased. Be careful with the generated sentences. - Memory usage may be large for long inferences. - If possible, we recommend inferring with llamacpp rather than Transformers.