what is the possible max new token?

#15
by PF94 - opened

I tried the example inference code and it worked great. But when I increase the max_new_token to 2048, most of the time I got either RuntimeError: CUDA error: device-side assert triggered or RuntimeError: CUDA error: CUBLAS_STATUS_EXECUTION_FAILED. In config.json max_position_embeddings and max_length are 4096. Does it mean the model can output up to 4096 tokens?

Sign up or log in to comment