Oh Come on

by supercharge19 - opened Apr 30

Discussion

supercharge19

Apr 30

What is 1m? That can't be context length, is it?

If that is what is the ram requirement for that, 1 million tokens?

Crystalcareai

Cognitive Computations org Apr 30

It is indeed context length. You'd likely need 506.25gb of vram to run it at full 1m context length, though you can lower the ctx until it fits in your available ram.

I estimated that vram requirement using this tool: https://llm-calc.rayfernando.ai/

ehartford

Cognitive Computations org Apr 30

What is 1m? That can't be context length, is it?

If that is what is the ram requirement for that, 1 million tokens?

wait until I'm done uploading it. Also i have lots of model cards to do and also, you aren't paying me anything, I work at my pace.

ehartford changed discussion status to closed Apr 30

pabloce

Cognitive Computations org May 1

@ehartford I can help you if you need help :)

supercharge19

May 1

It is indeed context length. You'd likely need 506.25gb of vram to run it at full 1m context length, though you can lower the ctx until it fits in your available ram.

I estimated that vram requirement using this tool: https://llm-calc.rayfernando.ai/

Thanks that is very useful recourse.

KipTonic

May 2

24GB of RAM was just barely enough for 16K context for me so one million sounds like it would need a lot more than 500 GB.

Crystalcareai

Cognitive Computations org May 2

Perhaps - but the way ctx length is computed and stored in memory isn't exactly linear. Having 48gb of vram doesn't mean you'll be stuck at just 32k context length. The model only takes up a fixed amount of space as well.

KipTonic

May 3

This is regular RAM, not VRAM. Is that worse?

Kearm

Cognitive Computations org May 3

This is regular RAM, not VRAM. Is that worse?

Depends who you are but y'all are mixing using the words ram with VRAM because llama.cpp exists. VRAM is generally worse to need more.

julien-c

May 3

https://llm-calc.rayfernando.ai/ seems down right now or it's just me?

KipTonic

May 4

I meant I am not using my GPU at all (it sucks anyway).
The module inside the link is down for me, it doesn't do any calculation, so it's not just you.