https://huggingface.co/NeverSleep/Lumimaid-v0.2-70B

#175

by Szarka - opened Jul 26, 2024

Discussion

Szarka

Jul 26, 2024

please do this one

mradermacher

Owner Jul 26, 2024

Hi, I am currently delaying larger llama-3.1 models because llama.cpp has no good support for it yet, and I'd like to avoid redoing it. This model (and many others) are already eagerly waiting :) I hope the rope fixes will land any day now, in which case this will be one of the first models to be quanted.

RazielAU

Jul 27, 2024

I'd like to second this request, mainly after a quant for 24GB cards, like an i1-IQ2_XS, or i1-IQ2_XSS.

DatToad

Jul 27, 2024

Thirded. In the meantime I followed the advice for the L3.1 8B models to manually set RoPE Base to 8M, so 70M for the 70B. Very brief testing on an existing chat but it didn't seem like anything was wrong and it was to some degree pulling more details from the context, but I was testing with only 24k.

mradermacher

Owner Jul 28, 2024

imatrix quants are currently generating, but depending on some very chaotic (at the moment) scheduling it might or might not get interrupted by something else. Funnily enough, it's in front of the llama-3.1 70b instruct model itself.

mradermacher changed discussion status to closed Jul 28, 2024

DatToad

Jul 28, 2024

Already pulled the one I needed, Q4_K_S, best option for 24k context on a pair of P40s. Amazing people's "priorities". :D

mradermacher

Owner Jul 28, 2024

Q4_K_S is always a good choice :)

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

Your need to confirm your account before you can post a new comment.

· Sign up or log in to comment