matlok 's Collections
LMM

Papers - Pre-training - Dynamic Context Length

For HyperClova X they split 90% at 4096 and 10% at 32k context length during pt