What is the context length?

#2
by softwareweaver - opened

Is this model the original 8K context? or is it larger?

Thanks,
Ash

image.png

If it works like all Llama3 models I tried out, you can push it further just by maxing rope_tetha in cfg file or cli/ui, without using any NTK nor compress_pos solutions.
For 8B-instruct it can get a key in 50k context at most with "rope_tetha": 8000000.0
Still figuring why, but that kinda work out of the box without further training (need to eval the possible drawbacks but they feel almost non-existent).

softwareweaver changed discussion status to closed

Sign up or log in to comment