Context Length and Max New Tokens

#1
by Shivam098 - opened

-max-input-length 3696 --max-total-tokens 4096 --max-batch-prefill-tokens 4096
How to increase using
output = model.generate(inputs=input_ids, temperature=0.7, do_sample=True, top_p=0.95, top_k=40, max_new_tokens=512)

same here, the mistral 7b base model is 8k context lenght I understand, this model is 4k? or is a typo in the readme.md ?
Thanks!

Just change it to

---max-input-length 7892 --max-total-tokens 8192 --max-batch-prefill-tokens 8192

or whatever you want

How to do here
import torch
from transformers import AutoModelForCausalLM, AutoTokenizer, pipeline
def run():
model_name_or_path = "TheBloke/zephyr-7B-beta-GPTQ"
# To use a different branch, change revision
# For example: revision="gptq-4bit-32g-actorder_True"
model = AutoModelForCausalLM.from_pretrained(model_name_or_path,
device_map="auto",
trust_remote_code=False,
revision="main")

tokenizer = AutoTokenizer.from_pretrained(model_name_or_path, use_fast=True)

# Open the text file for reading
with open('data.txt', 'r') as file:
    # Read the entire content of the file into a string
    file_content = file.read()

prompt = f"Extract the usefull information from the following given text: {file_content} and convert the extracted data in the structured format using valid json only."
prompt_template=f'''<|system|>
</s>
<|user|>
{prompt}</s>
<|assistant|>
'''

print("\n\n*** Generate:")

input_ids = tokenizer(prompt_template, return_tensors='pt').input_ids.cuda()
output = model.generate(inputs=input_ids, temperature=0.7, do_sample=True, top_p=0.95, top_k=40, max_new_tokens=512)
print(tokenizer.decode(output[0]))

# Inference can also be done using transformers' pipeline

print("*** Pipeline:")
pipe = pipeline(
    "text-generation",
    model=model,
    tokenizer=tokenizer,
    max_new_tokens=512,
    do_sample=True,
    temperature=0.7,
    top_p=0.95,
    top_k=40,
    repetition_penalty=1.1
)

print(pipe(prompt_template)[0]['generated_text'])

Sign up or log in to comment