Dolphin-2.8-Mistral-7B-v2 iMatrix Quantizations

This repository contains iMatrix quantizations of the dolphin-2.8-mistral-7b-v02 model. The original model was trained with 16k long context data on top of a newer mistral-7b, enabling it to work well with up to 32k context.

The iMatrix file was generated using the wiki.train.raw dataset, which took a few hours to process. We have also included the wiki.test.raw file for perplexity testing.

Quantization Benefits

You'll notice that these quantizations are slightly larger compared to others, but they offer much lower perplexity. For example, the 2s 2-bit mixed models are very usable due to this custom quantization and don't lose much perplexity compared to the full f16 model.

Notes

The 8-bit weight is not iMatrix quantized (although it wouldn't make a significant difference). It can be used as a reference perplexity measurement along with dolphinf16.
All other models, including the 4k variants, have been quantized with iMatrix and should exhibit better perplexity performance compared to regular k quantizations.
iMatrix quantization can be applied to all k quantizations, not just the i ones.
1bit quant gives garbage, but all else, including 2xxs are suprisingly very coherent

Perplexity values

./perplexity -m dolphin2m.gguf -f wiki.test.raw -ngl 34

dolphinf16.gguf perplexity - [1]4.3052,[2]4.8421,[3]5.7401,[4]6.6554,[5]6.6552,[6]6.6580,[7]6.9198,[8]7.0918,[9]7.2503,[10]7.5712,[11]7.8367,[12]7.8476,
Final estimate: PPL = 7.8476 +/- 0.35984    THIS IS BASELINE 

dolphin1bit.gguf perplexity - [1]59477.7292,[2]50746.4580,[3]53932.3131,[4]55797.8433,[5]45995.5032,[6]46595.4234,[7]45130.6779,[8]40769.8593,[9]41322.7842,[10]50644.7393,[11]50676.5808,[12]51939.5094,
Final estimate: PPL = 51939.5094 +/- 1339.29301     1BIT GIVES GARBAGE OUTPUT

dolphin2xxs.gguf perplexity - [1]5.4651,[2]6.7941,[3]7.8700,[4]8.7155,[5]8.3566,[6]8.3316,[7]8.6121,[8]8.7565,[9]8.9041,[10]9.3572,[11]9.6426,[12]9.5626,
Final estimate: PPL = 9.5626 +/- 0.43895    9.5 vs 7.8 at f16, means lossy but coherent

dolphin2s.gguf perplexity - [1]5.0014,[2]5.9477,[3]6.8424,[4]7.6348,[5]7.4755,[6]7.4667,[7]7.7625,[8]7.8807,[9]8.0374,[10]8.4086,[11]8.6475,[12]8.6427,
Final estimate: PPL = 8.6427 +/- 0.39501

dolphin2m.gguf perplexity - [1]4.5874,[2]5.3203,[3]6.2334,[4]7.1444,[5]7.1188,[6]7.1422,[7]7.4717,[8]7.6180,[9]7.7948,[10]8.1319,[11]8.3747,[12]8.4095,
Final estimate: PPL = 8.4095 +/- 0.38329

dolphin2k.gguf perplexity - [1]4.6331,[2]5.2648,[3]6.0493,[4]7.0165,[5]6.9300,[6]6.9177,[7]7.2362,[8]7.4417,[9]7.6292,[10]7.9640,[11]8.2121,[12]8.1930,
Final estimate: PPL = 8.1930 +/- 0.37241

dolphin2ks.gguf perplexity - [1]4.7995,[2]5.6653,[3]6.4331,[4]7.3841,[5]7.2724,[6]7.3161,[7]7.6567,[8]7.8423,[9]8.0129,[10]8.4033,[11]8.6636,[12]8.6391,
Final estimate: PPL = 8.6391 +/- 0.39315

dolphin3s.gguf perplexity - [1]4.3574,[2]4.9936,[3]5.8814,[4]6.8093,[5]6.8086,[6]6.7949,[7]7.0638,[8]7.2204,[9]7.3844,[10]7.6895,[11]7.9489,[12]7.9527,
Final estimate: PPL = 7.9527 +/- 0.36202

dolphin3xs.gguf perplexity - [1]4.3161,[2]4.9579,[3]5.8647,[4]6.8064,[5]6.7614,[6]6.7501,[7]7.0133,[8]7.2103,[9]7.3862,[10]7.7265,[11]7.9813,[12]7.9780,
Final estimate: PPL = 7.9780 +/- 0.36655

dolphin3xxs.gguf perplexity - [1]4.5418,[2]5.0902,[3]6.0117,[4]6.9852,[5]6.9329,[6]6.9165,[7]7.1853,[8]7.3359,[9]7.4923,[10]7.8122,[11]8.0696,[12]8.0592,
Final estimate: PPL = 8.0592 +/- 0.36502

dolphin3m.gguf perplexity  - [1]4.3203,[2]4.9566,[3]5.8151,[4]6.7619,[5]6.7801,[6]6.7762,[7]7.0351,[8]7.2054,[9]7.3766,[10]7.6896,[11]7.9580,[12]7.9660,
Final estimate: PPL = 7.9660 +/- 0.36234

dolphin4km.gguf perplexity - [1]4.3331,[2]4.9129,[3]5.7915,[4]6.7030,[5]6.6921,[6]6.6978,[7]6.9570,[8]7.1284,[9]7.2854,[10]7.6098,[11]7.8696,[12]7.8767,
Final estimate: PPL = 7.8767 +/- 0.35875

dolphin4nl.gguf perplexity - [1]4.2682,[2]4.8494,[3]5.7530,[4]6.6890,[5]6.6672,[6]6.6637,[7]6.9332,[8]7.1126,[9]7.2821,[10]7.5998,[11]7.8733,[12]7.8875,
Final estimate: PPL = 7.8875 +/- 0.36227

dolphin4xs.gguf perplexity - [1]4.2986,[2]4.8610,[3]5.7658,[4]6.6906,[5]6.6621,[6]6.6608,[7]6.9321,[8]7.1140,[9]7.2892,[10]7.6085,[11]7.8806,[12]7.8921,
Final estimate: PPL = 7.8921 +/- 0.36258

dolphin5ks.gguf perplexity - [1]4.2557,[2]4.8249,[3]5.7413,[4]6.6671,[5]6.6611,[6]6.6686,[7]6.9389,[8]7.1079,[9]7.2707,[10]7.5962,[11]7.8529,[12]7.8627,
Final estimate: PPL = 7.8627 +/- 0.36124

dolphin5km.gguf perplexity - [1]4.3191,[2]4.8597,[3]5.7844,[4]6.7120,[5]6.6994,[6]6.6964,[7]6.9569,[8]7.1215,[9]7.2792,[10]7.6109,[11]7.8682,[12]7.8794,
Final estimate: PPL = 7.8794 +/- 0.36185

dolphin6k.gguf perplexity - [1]4.3264,[2]4.8531,[3]5.7574,[4]6.6741,[5]6.6707,[6]6.6795,[7]6.9362,[8]7.1076,[9]7.2678,[10]7.5864,[11]7.8496,[12]7.8628,
Final estimate: PPL = 7.8628 +/- 0.36075

dolphin8bit.gguf perplxity - [1]4.3063,[2]4.8463,[3]5.7347,[4]6.6499,[5]6.6471,[6]6.6531,[7]6.9160,[8]7.0899,[9]7.2509,[10]7.5705,[11]7.8357,[12]7.8466,
Final estimate: PPL = 7.8466 +/- 0.35948

As we can see 2bit xxs with this method actually is surprisingly coherent.