TYVM
was just about to download the Q8 and grind one of these out myself! IQ3 M or S...
Can you include some boilerplate about how your imatrix file was generated for your future uploads? Chunks, dataset and and ctx length? I've seen some people use context lengths of 32
and ... i have no idea why? I can get a 57B done in less than an hour with less than 50% offload. on 6 extra cores.
Context length 512, 322 chunks, 164k semi-random english-only tokens is currently my standard set. The data set unfortunately not public, but includes groups_merged.txt by, I think ikawrakow. It's not based on wikipedia texts.
I originally wanted to make very small quants of very large models, asnd thought that concentrating on english can help, at a loss of fidelity for other languages, and also planned to do an iterative process to improve it (thus the "i1"), but I am currently a bit overwhelmed, so science needs to wait and I chose instead to provide a variety of imatrix quants for the community instead. On quite old and underpowered hardware to boot.
As for the context length of 32, if you go through the llama.cpp PRs, some people have found that they can be quite good. imatrix training is currently even worse than a black art - nobody seems to know whats really good. So I think the dust needs to settle first :)