NeverSleep/Noromaid-13b-v0.3 · Gibberish Output

Hi, I can't seem to figure out how to get proper output from the model just loading in Python. The simplest code example I have that can reproduce this:

from transformers import LlamaTokenizer, LlamaForCausalLM, LlamaConfig, GenerationConfig, BitsAndBytesConfig
from torch import cuda, float16, no_grad

# Path I downloaded all files to
MODEL_PATH = r"G:\Noromaid13b"

device = f'cuda:{cuda.current_device()}' if cuda.is_available() else 'cpu'

bnb_config = BitsAndBytesConfig(
    load_in_4bit=True,
    bnb_4bit_quant_type='nf4',
    bnb_4bit_use_double_quant=True,
    bnb_4bit_compute_dtype=float16
)
model_config = LlamaConfig.from_pretrained(MODEL_PATH, local_files_only=True)
model = LlamaForCausalLM.from_pretrained(
    MODEL_PATH,
    config=model_config,
    quantization_config=bnb_config,
    device_map=device,
    local_files_only=True
)
tokenizer = LlamaTokenizer.from_pretrained(MODEL_PATH, device_map=device, local_files_only=True)
gen_config = GenerationConfig.from_pretrained(MODEL_PATH)
gen_config.max_length = 4096

prompt = """Below is an instruction that describes a task. Write a response that appropriately completes the request.

### Instruction:
Avoid repetition, don't loop. Develop the plot slowly, always stay in character. Describe all actions in full, elaborate, explicit, graphic, and vivid detail. Mention all relevant sensory perceptions. Do not act or speak for {{user}}. Write at least 2 paragraphs.

### Input:
How is your day today?

### Response:
"""

input_ids = tokenizer(prompt, return_tensors="pt").input_ids.to(device)
output = model.generate(inputs=input_ids, generation_config=gen_config, do_sample=True, temperature=0.6, top_p=0.9, max_new_tokens=100)
response = tokenizer.batch_decode(output[:, input_ids.shape[1]:], skip_special_tokens=True)[0]
print(response)

The response I get from the model is:

"My.s thes.!, the1. a a.a.,t and,..........., of and of and,. and and.O, and' and.,, and,O, and, and and and,,..,,.. and. and. and, and.,, and.,,. and.t and,O.,.,,. and and... and,"

Am I loading it wrong or doing something weird? I used the same script for loading Utopia-13b and got great results. I know this is a test model but other people mentioned getting good outputs.

Thank you!