mosaicml/mpt-30b-instruct · Truncated model input, no output.

I'm running MPT-30B-Instruct on 4 A10s as follows:

import transformers
import pickle
name = 'mosaicml/mpt-30b-instruct'
tokenizer = transformers.AutoTokenizer.from_pretrained(name)
model = transformers.AutoModelForCausalLM.from_pretrained(
    name, trust_remote_code=True, load_in_8bit=True,  device_map='auto'
                                                         )
pipe = transformers.pipeline('text-generation', model=model, 
                             tokenizer=tokenizer, device_map='auto')
with open('my_2000_token_text.txt', 'rb') as f:
     text = f.read()
output = pipe(text, max_new_tokens=600)[0]['generated_text']

What happens here is that instead of the expected behavior of output being an extension of input, it will be only part of input, with no additional text at the end (i.e., text.startswith(output) will be True and output will be shorter than text, while we would expect the reverse).

MPT-30B is supposed to have an 8k token context window, and max_seq_len is 8192 by default, so I don't see any reason this should be happening. How can I fix this problem?