what is the possible max new token?
#15
by
PF94
- opened
I tried the example inference code and it worked great. But when I increase the max_new_token to 2048, most of the time I got either RuntimeError: CUDA error: device-side assert triggered
or RuntimeError: CUDA error: CUBLAS_STATUS_EXECUTION_FAILED
. In config.json max_position_embeddings
and max_length
are 4096. Does it mean the model can output up to 4096 tokens?