Max output tokens?

#12
by stri8ted - opened

I understand the input context length is 8k, but what about the output?

The output takes the same space as the input. The model completes the 8k token space with the response. You could slide the contxt window and get more output, but then you're losing context at the beginning.

Example, if your input is 100 tokens, you have ~7900 tokens for completion. But if your input is 7900 tokens, you have ~100 tokens left for response until your input starts getting trimmed out. The model can only pay attention to 8k tokens at most.

I understand the input context length is 8k, but what about the output?

May I ask where I can check this "8k context length" configuration for the llama3 model? Thanks!

I understand the input context length is 8k, but what about the output?

May I ask where I can check this "8k context length" configuration for the llama3 model? Thanks!

https://huggingface.co/meta-llama/Meta-Llama-3-8B-Instruct/blob/main/config.json

look at max_position_embeddings

Sign up or log in to comment