library_name: transformers | |
tags: [] | |
# Malaysian Llama-3 8B 262144 context length | |
262144 context length and 207100000 RoPE Theta. | |
WanDB, https://wandb.ai/huseinzol05/EasyContext-262144?nw=nwuserhuseinzol05 | |
Source code, https://github.com/mesolitica/malaya/tree/master/session/llama3#extend-1m-context-length | |
Special thanks to https://github.com/jzhang38/EasyContext for wrapping https://github.com/zhuzilin/ring-flash-attention for distributed training! |