Llama 3.1 70B Abliterated

#194
by AIGUYCONTENT - opened

I originally asked Bartowski if he could make an Abliterated quant for the new Llama 3.1 70B....but he said someone must first make the Abliterated model.

Are you the right person to ask for this? Or is there a reason why nobody has made it yet? Is it not possible or does it not work as expected?

I see there is one for Llama 3.1 8B Abliterated.

I find the abliterated models of Llama to be far superior for my line of work than the regular ones.

Thank you!

p.s. Is 120GB of VRAM enough to make the quants that you make? Just asking out of sheer curiosity of what's required as far as VRAM goes.

I am not the right person to ask - like bartowksi, I only do quants (or rather, I don't know what else he does, but I only do GGUF quants).

And for VRAM, you don't need any VRAM for static quants, and in fact, none of the computers I regularly use have any VRAM. Quaantisations are done per-tensor, and usually only need a single tensor loaded in memory.

For generation of imatrix measurement data, you need a graphics card with some VRAM (it's too slow otherwise, but can be done), but the model does not need to fit into VRAM - even a graphics card with 8 gb vram can comfortably do imatrix calculations for a 70b or even much larger (it depends more on the model architecture than on the number of weights). I initially did all my imatrix measurements using an RTX 4070 Ti and had plently of ram to spare. If you have a fast disk and are a bit patient, you can even stream the model from disk.

mradermacher changed discussion status to closed

Sign up or log in to comment