metadata

datasets:
  - Open-Orca/OpenOrca
language:
  - en
library_name: transformers
pipeline_tag: text-generation

Overview

Unreleased, untested, unfinished beta.

Inference

Remove .to('cuda') for unaccelerated.

import torch
from transformers import AutoModelForCausalLM, AutoTokenizer, GenerationConfig

model = AutoModelForCausalLM.from_pretrained("Open-Orca/oo-phi-1_5",
    trust_remote_code=True,
    torch_dtype=torch.bfloat16
    ).to('cuda')
tokenizer = AutoTokenizer.from_pretrained("Open-Orca/oo-phi-1_5",
    trust_remote_code=True,
    torch_dtype=torch.bfloat16)

sys_prompt = "I carefully provide accurate, factual, thoughtful, nuanced answers and am brilliant at reasoning. " \
    "I am an assistant who thinks through their answers step-by-step to be sure I always get the right answer. " \
    "I think more clearly if I write out my thought process in a scratchpad manner first; therefore, I always " \
    "explain background context, assumptions, and step-by-step thinking BEFORE trying to answer a question."
prompt = "Tell me about yourself please."

prefix = "<|im_start|>"
suffix = "<|im_end|>\n"
sys_format = prefix + "system\n" + sys_prompt + suffix
user_format = prefix + "user\n" + prompt + suffix
assistant_format = prefix + "assistant\n"
input_text = sys_format + user_format + assistant_format

generation_config = GenerationConfig(
    max_length=512, temperature=0.01, top_p=0.95, repetition_penalty=1.1,
    do_sample=True, use_cache=True,
    eos_token_id=tokenizer.eos_token_id, pad_token_id=tokenizer.pad_token_id,
    transformers_version="4.33.1")

inputs = tokenizer(input_text, return_tensors="pt", return_attention_mask=False).to('cuda')
outputs = model.generate(**inputs, generation_config=generation_config)

text = tokenizer.batch_decode(outputs)[0]
print(text)