File size: 2,806 Bytes

772adaa
0bf01f9
 
 
 
6251126
0bf01f9
 
 
6251126
7e76ee7
0bf01f9
5581995
d84e35f
 
 
 
88fdb98
 
d84e35f
 
 
 
2da7660
0bf01f9
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
f0d380e
 
 
 
6c81f4d
f0d380e
6c81f4d
 
f0d380e
6c81f4d
 
f0d380e
6c81f4d
f0d380e
6c81f4d
 
a74d9e2
6c81f4d
 
 
 
f0d380e
 
eb38eda
 
 
 
 
 
 
 
f0d380e
eb38eda
f0d380e

---
license: apache-2.0
language:
- en
- ja
tags:
- finetuned
library_name: transformers
pipeline_tag: text-generation
---
<img src="./veteus_logo.svg" width="100%" height="20%" alt=""> 

# Our Models
- [Vecteus](https://huggingface.co/Local-Novel-LLM-project/Vecteus-v1)

- [Ninja-v1](https://huggingface.co/Local-Novel-LLM-project/Ninja-v1) 

- [Ninja-v1-NSFW](https://huggingface.co/Local-Novel-LLM-project/Ninja-v1-NSFW)

- [Ninja-v1-128k](https://huggingface.co/Local-Novel-LLM-project/Ninja-v1-128k)

- [Ninja-v1-NSFW-128k](https://huggingface.co/Local-Novel-LLM-project/Ninja-v1-NSFW-128k)
  
## Model Card for VecTeus-v1.0

The Mistral-7B--based Large Language Model (LLM) is an noveldataset fine-tuned version of the Mistral-7B-v0.1

VecTeus has the following changes compared to Mistral-7B-v0.1.
- 128k context window (8k context in v0.1)
- Achieving both high quality Japanese and English generation
- Can be generated NSFW
- Memory ability that does not forget even after long-context generation

This model was created with the help of GPUs from the first LocalAI hackathon.

We would like to take this opportunity to thank

## List of Creation Methods

- Chatvector for multiple models
- Simple linear merging of result models
- Domain and Sentence Enhancement with LORA
- Context expansion

## Instruction format

  Freed from templates. Congratulations

## Example prompts to improve (Japanese)

  - BAD:　あなたは○○として振る舞います
  - GOOD: あなたは○○です

  - BAD: あなたは○○ができます
  - GOOD: あなたは○○をします

## Performing inference

```python
from transformers import AutoModelForCausalLM, AutoTokenizer
import torch

model_id = "Local-Novel-LLM-project/Vecteus-v1"
new_tokens = 1024

model = AutoModelForCausalLM.from_pretrained(model_id, trust_remote_code=True, torch_dtype=torch.float16, attn_implementation="flash_attention_2", device_map="auto")
tokenizer = AutoTokenizer.from_pretrained(model_id)

system_prompt = "あなたはプロの小説家です。\n小説を書いてください\n-------- "

prompt = input("Enter a prompt: ")
system_prompt += prompt + "\n-------- "
model_inputs = tokenizer([system_prompt], return_tensors="pt").to("cuda")


generated_ids = model.generate(**model_inputs, max_new_tokens=new_tokens, do_sample=True)
print(tokenizer.batch_decode(generated_ids)[0])
````

## Merge recipe
  
  
  - VT0.1 = Ninjav1 + Original Lora
  - VT0.2 = Ninjav1 128k + Original Lora
  - VT0.2on0.1 = VT0.1 + VT0.2
  
  - VT1 = all VT Series + Lora + Ninja 128k and Normal

## Other points to keep in mind
- The training data may be biased. Be careful with the generated sentences.
- Memory usage may be large for long inferences.
- If possible, we recommend inferring with llamacpp rather than Transformers.