Syahmi Azhar

prsyahmi
Β·

AI & ML interests

None yet

Recent Activity

liked a model 14 days ago
NexaAIDev/omnivision-968M
liked a model 14 days ago
rain1011/pyramid-flow-sd3
liked a model 14 days ago
rain1011/pyramid-flow-miniflux
View all activity

Organizations

None yet

prsyahmi's activity

Reacted to singhsidhukuldeep's post with πŸ€— 7 months ago
view post
Post
1456
You are all happy 😊 that @meta-llama released Llama 3.

Then you are sad πŸ˜” that it only has a context length of 8k.

Then you are happy πŸ˜„ that you can just scale llama-3 PoSE to 96k without training, only needing to modify max_position_embeddings and rope_theta.

But then you are sad 😒 it only improves the model's long-context retrieval performance (i.e., finding needles) while hardly improving its long-context utilization capability (doing QA and summarization).

But then you are happy 😁 that the
@GradientsTechnologies community has released the long-context Llama-3-8B-Instruct-262K with long context (262k-1M+).

Now we have another paper "Extending Llama-3's Context Ten-Fold Overnight" πŸ“œ.

The context length of Llama-3-8B-Instruct is extended from 8K to 80K using QLoRA fine-tuningβš™οΈ.

The training cycle is highly efficient, taking "only" πŸ˜‚ 8 hours on a single 8xA800 (80G) GPU machine.

The model also preserves its original capability over short contexts. ✁

The dramatic context extension is mainly attributed to merely 3.5K synthetic training samples generated by GPT-4.πŸ“Š

The paper suggests that the context length could be extended far beyond 80K with more computation resources (πŸ˜… GPU-poor).

The team plans to publicly release all resources, including data, model, data generation pipeline, and training code, to facilitate future research from the ❀️ community.

Paper: https://arxiv.org/abs/2404.19553

This is where we are... until next time... 🌟

Extending Llama-3's Context Ten-Fold Overnight (2404.19553)
New activity in mesolitica/gemma-2B-16k-instructions 8 months ago

readme: Fix chat template

1
#2 opened 8 months ago by prsyahmi

Chat template

1
#1 opened 8 months ago by prsyahmi