Discussion: Llama3 16k

#31

by Vi6DDarkKing - opened Apr 24, 2024

Apr 24, 2024

Say Lewdiculous been enjoying your new modes and was wondering, if you'd heard about the new community Llama 3 with 16k context window and if you're planning on fine tuning it?

Lewdiculous changed discussion title from Llama3 16k to Discussion: Llama3 16k Apr 24, 2024

Lewdiculous

Owner Apr 24, 2024

•

edited Apr 24, 2024

I don't tune, haha.

I brought this up with someone that could work on merges here:
https://huggingface.co/ChaoticNeutrals/Poppy_Porpoise-v0.6-L3-8B/discussions/1#662741c4088f0f0c9187de7b

But it seems it's not so necessary, since the regular model with 8K native context using RoPE scaling (it's automatic on KoboldCpp for example) can already handle up to 32K very well, and 16K even better.

Lewdiculous changed discussion status to closed Apr 24, 2024

Vi6DDarkKing

Apr 24, 2024

In the Oggabooga WebUI that's the compress_pos_emb correct? If I want to use 32K tokens of context I'd set it to 4 right?

Lewdiculous

Owner Apr 24, 2024

I only use GGUF models work KoboldCpp and that's gonna be my recommendation and supported format for the time being. @Nitral-AI might be able to talk about EXL2 scaling on Ooba.

saishf

Apr 24, 2024

In the Oggabooga WebUI that's the compress_pos_emb correct? If I want to use 32K tokens of context I'd set it to 4 right?

According to the wiki yes, 4 is quadruple the context.

compress_pos_emb: The first and original context-length extension method, discovered by kaiokendev. When set to 2, the context length is doubled, 3 and it's tripled, etc. It should only be used for models that have been fine-tuned with this parameter set to different than 1. For models that have not been tuned to have greater context length, alpha_value will lead to a smaller accuracy loss.

Vi6DDarkKing

May 4, 2024

Say Lewdiculous You said 16k tokens wasn't appealing. Would one million+ Context tokens be a worthy challenge?

https://huggingface.co/gradientai/Llama-3-8B-Instruct-Gradient-1048k

Lewdiculous

Owner May 4, 2024

•

edited May 4, 2024

lmao that's a lot of context haha folks operating miracles, kick Gemini where it hurts!

Nitral-AI

May 4, 2024

Say Lewdiculous You said 16k tokens wasn't appealing. Would one million+ Context tokens be a worthy challenge?

https://huggingface.co/gradientai/Llama-3-8B-Instruct-Gradient-1048k

Should i do some memes with this now i wonder?

Lewdiculous

Owner May 4, 2024

•

edited May 4, 2024

Nitral you could potentially take on Gemini "but for the modern ~~coomer~~ roleplayer", KEKW.

My body ~~and my wallet~~ already failed after 1 million context.

Nitral-AI

May 4, 2024

Nitral you could potentially take on Gemini "but for the modern ~~coomer~~ roleplayer", KEKW.

My body ~~and my wallet~~ already failed after 1 million context.

potentially after poppy 1.0 drops.

Lewdiculous

Owner May 4, 2024

I was 90% joking but I can see the meme calls for your soul.

Vi6DDarkKing

May 4, 2024

I was 90% joking but I can see the meme calls for your soul.

RTX 5090 HERE I COME!

saishf

May 4, 2024

Gonna need a few-
It's just like 190GB for 1.048M ctx at Q8, maybe 30GB with Flash-Attention?
The magic
I'd be mind blown if Nvidia to released a 48GB gpu to consumers (or as Nvidia sees consumers, cockroaches)
A 48GB RTX 6000 is $15K here, Sadly they removed the ability to see prices for A100's but they were like $30K.
They're cheaper in Australia than here.

This blows my mind though

That's without the system included :3

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

Your need to confirm your account before you can post a new comment.

· Sign up or log in to comment