Requantized everything with new pre-tokenizer

Browse files

Files changed (12) hide show

OpenCodeInterpreter-DS-6.7B.IQ1_S.gguf +2 -2
OpenCodeInterpreter-DS-6.7B.IQ2_M.gguf +2 -2
OpenCodeInterpreter-DS-6.7B.IQ2_S.gguf +2 -2
OpenCodeInterpreter-DS-6.7B.IQ2_XS.gguf +2 -2
OpenCodeInterpreter-DS-6.7B.IQ2_XXS.gguf +2 -2
OpenCodeInterpreter-DS-6.7B.IQ3_M.gguf +2 -2
OpenCodeInterpreter-DS-6.7B.IQ3_S.gguf +2 -2
OpenCodeInterpreter-DS-6.7B.IQ3_XS.gguf +2 -2
OpenCodeInterpreter-DS-6.7B.IQ3_XXS.gguf +2 -2
OpenCodeInterpreter-DS-6.7B.IQ4_XS.gguf +2 -2
OpenCodeInterpreter-DS-6.7B.imatrix.dat +2 -2
README.md +5 -5

OpenCodeInterpreter-DS-6.7B.IQ1_S.gguf CHANGED Viewed

@@ -1,3 +1,3 @@
 version https://git-lfs.github.com/spec/v1
-oid sha256:e617a6d520032a2a782c6105aa153ae70a8f8b0ba38fdbda67f5c3d02f143f40
-size 1530209440

 version https://git-lfs.github.com/spec/v1
+oid sha256:a09e844828446cf1c2f6fbfdda289b7c048f91f1bf64ed42486dc7ce74f00873
+size 1530080384

OpenCodeInterpreter-DS-6.7B.IQ2_M.gguf CHANGED Viewed

@@ -1,3 +1,3 @@
 version https://git-lfs.github.com/spec/v1
-oid sha256:dc8b562221cc31b9b4db87e63d9a09b99481e90bf53b93e6d365bfc540cd95e2
-size 2361484448

 version https://git-lfs.github.com/spec/v1
+oid sha256:7662673acdded25afa5846d17ea6391d8e0d8c57bd434e5ffa27a9761488140c
+size 2361355392

OpenCodeInterpreter-DS-6.7B.IQ2_S.gguf CHANGED Viewed

@@ -1,3 +1,3 @@
 version https://git-lfs.github.com/spec/v1
-oid sha256:e2734866607bf4bf33d004bdc164af5f3581c9b19bba39b9c27a35e805039066
-size 2198299808

 version https://git-lfs.github.com/spec/v1
+oid sha256:66e7d5bbe7eadc9166c5fd699082521e8a72b00ab12696912809583be567f321
+size 2198170752

OpenCodeInterpreter-DS-6.7B.IQ2_XS.gguf CHANGED Viewed

@@ -1,3 +1,3 @@
 version https://git-lfs.github.com/spec/v1
-oid sha256:b91b0c74cfc8e8b40283311327836503dde18847da73ca9dd35d05e6165e52a9
-size 2036540576

 version https://git-lfs.github.com/spec/v1
+oid sha256:0a13b4cb2a81348911d6d9ca9a587e9ff7b08478974cb5e30502792ca6ce2a46
+size 2036411520

OpenCodeInterpreter-DS-6.7B.IQ2_XXS.gguf CHANGED Viewed

@@ -1,3 +1,3 @@
 version https://git-lfs.github.com/spec/v1
-oid sha256:3f94cdb4cbcbf3591f204a438e3724a310f91fa62617dc0dba6ced5c02dc86bb
-size 1856578720

 version https://git-lfs.github.com/spec/v1
+oid sha256:d0660175e93185ca92b8e01a2cecc043f643cc23b4357306918be58948ba3ec4
+size 1856449664

OpenCodeInterpreter-DS-6.7B.IQ3_M.gguf CHANGED Viewed

@@ -1,3 +1,3 @@
 version https://git-lfs.github.com/spec/v1
-oid sha256:5f6f9c0b8e5529de5f2914bd147e50d59b8e61475f2cd7e2d495e50b7a69b36a
-size 3116737696

 version https://git-lfs.github.com/spec/v1
+oid sha256:4482aa902c96eeaebbabc76f81129bc2e67284182a2073487429ed150a58f6c5
+size 3116608640

OpenCodeInterpreter-DS-6.7B.IQ3_S.gguf CHANGED Viewed

@@ -1,3 +1,3 @@
 version https://git-lfs.github.com/spec/v1
-oid sha256:8640de98871c18b430f47d84bc7361015adf8d5cf46cac500d1575be43de6aab
-size 2950177952

 version https://git-lfs.github.com/spec/v1
+oid sha256:e1c65eb086d34bafbe005a1d320d9709a0dd080af89edef574c074ef73d27be9
+size 2950048896

OpenCodeInterpreter-DS-6.7B.IQ3_XS.gguf CHANGED Viewed

@@ -1,3 +1,3 @@
 version https://git-lfs.github.com/spec/v1
-oid sha256:2980e66945bed767a280b13089ed9adabaf61f90a0b8102dca9ad564a60eb582
-size 2798396576

 version https://git-lfs.github.com/spec/v1
+oid sha256:af3525ae7e915200a857a1d6012eb63eb5ebde0c0fdb077a047c57cbccc3812c
+size 2798267520

OpenCodeInterpreter-DS-6.7B.IQ3_XXS.gguf CHANGED Viewed

@@ -1,3 +1,3 @@
 version https://git-lfs.github.com/spec/v1
-oid sha256:691218cc94838ef6e27d9f24796d8bcac2ad4c8d7e037059209584d2792fc2b7
-size 2587124896

 version https://git-lfs.github.com/spec/v1
+oid sha256:22e420c184552cc7028b5547c436c138ca91415f75a229bc6f378232416ea980
+size 2586995840

OpenCodeInterpreter-DS-6.7B.IQ4_XS.gguf CHANGED Viewed

@@ -1,3 +1,3 @@
 version https://git-lfs.github.com/spec/v1
-oid sha256:ce70c09cc1ab5d6415dd24ff452cb5b4e1b6a1f7fd5853ded6cbfc8006cf9b9e
-size 3621315744

 version https://git-lfs.github.com/spec/v1
+oid sha256:41bcc0e72a94b988446daab9daa6f8f6327065d5d51330ed91bc8d7b810c3c79
+size 3621186688

OpenCodeInterpreter-DS-6.7B.imatrix.dat CHANGED Viewed

@@ -1,3 +1,3 @@
 version https://git-lfs.github.com/spec/v1
-oid sha256:f8fd39c9b70789f1cf31e8b11ddc0ec39002f63f9b453aa277aaa8ec793ec9ab
-size 4562142

 version https://git-lfs.github.com/spec/v1
+oid sha256:d799acf6c89364444c40223faadf8696a469ad80d6621e0327e2a84524f59c42
+size 4562176

README.md CHANGED Viewed

@@ -23,9 +23,9 @@ quantized_by: CISC
 This repo contains State Of The Art quantized GGUF format model files for [OpenCodeInterpreter DS 6.7B](https://huggingface.co/m-a-p/OpenCodeInterpreter-DS-6.7B).
-Quantization was done with an importance matrix that was trained for ~1M tokens (2000 batches of 512 tokens) of answers from the [CodeFeedback-Filtered-Instruction](https://huggingface.co/datasets/m-a-p/CodeFeedback-Filtered-Instruction) dataset.
-Even though the 1-bit quantized model file "works" it is **not recommended** for normal use ~~as it is extremely error-prone~~, I've requantized it with a 4K-context imatrix which seems to have improved it a little bit but it still defaults to infinite loops, you have been warned. 🧐
 <!-- description end -->
@@ -59,6 +59,7 @@ They are also compatible with many third party UIs and libraries provided they a
 The new methods available are:
 * GGML_TYPE_IQ1_S - 1-bit quantization in super-blocks with an importance matrix applied, effectively using 1.56 bits per weight (bpw)
 * GGML_TYPE_IQ2_XXS - 2-bit quantization in super-blocks with an importance matrix applied, effectively using 2.06 bpw
 * GGML_TYPE_IQ2_XS - 2-bit quantization in super-blocks with an importance matrix applied, effectively using 2.31 bpw
 * GGML_TYPE_IQ2_S - 2-bit quantization in super-blocks with an importance matrix applied, effectively using 2.5 bpw
@@ -68,6 +69,7 @@ The new methods available are:
 * GGML_TYPE_IQ3_S - 3-bit quantization in super-blocks with an importance matrix applied, effectively using 3.44 bpw
 * GGML_TYPE_IQ3_M - 3-bit quantization in super-blocks with an importance matrix applied, effectively using 3.66 bpw
 * GGML_TYPE_IQ4_XS - 4-bit quantization in super-blocks with an importance matrix applied, effectively using 4.25 bpw
 Refer to the Provided Files table below to see what files use which methods, and how.
 </details>
@@ -78,7 +80,7 @@ Refer to the Provided Files table below to see what files use which methods, and
 | Name | Quant method | Bits | Size | Max RAM required | Use case |
 | ---- | ---- | ---- | ---- | ---- | ----- |
-| [OpenCodeInterpreter-DS-6.7B.IQ1_S.gguf](https://huggingface.co/CISCai/OpenCodeInterpreter-DS-6.7B-SOTA-GGUF/blob/main/OpenCodeInterpreter-DS-6.7B.IQ1_S.gguf) | IQ1_S | 1 | 1.5 GB| 3.5 GB | smallest, significant quality loss - not recommended **at all** |
 | [OpenCodeInterpreter-DS-6.7B.IQ2_XXS.gguf](https://huggingface.co/CISCai/OpenCodeInterpreter-DS-6.7B-SOTA-GGUF/blob/main/OpenCodeInterpreter-DS-6.7B.IQ2_XXS.gguf) | IQ2_XXS | 2 | 1.8 GB| 3.8 GB | very small, high quality loss |
 | [OpenCodeInterpreter-DS-6.7B.IQ2_XS.gguf](https://huggingface.co/CISCai/OpenCodeInterpreter-DS-6.7B-SOTA-GGUF/blob/main/OpenCodeInterpreter-DS-6.7B.IQ2_XS.gguf) | IQ2_XS | 2 | 1.9 GB| 3.9 GB | very small, high quality loss |
 | [OpenCodeInterpreter-DS-6.7B.IQ2_S.gguf](https://huggingface.co/CISCai/OpenCodeInterpreter-DS-6.7B-SOTA-GGUF/blob/main/OpenCodeInterpreter-DS-6.7B.IQ2_S.gguf) | IQ2_S | 2 | 2.1 GB| 4.1 GB | small, substantial quality loss |
@@ -91,8 +93,6 @@ Refer to the Provided Files table below to see what files use which methods, and
 Generated importance matrix file: [OpenCodeInterpreter-DS-6.7B.imatrix.dat](https://huggingface.co/CISCai/OpenCodeInterpreter-DS-6.7B-SOTA-GGUF/blob/main/OpenCodeInterpreter-DS-6.7B.imatrix.dat)
-Generated importance matrix file (4K context): [OpenCodeInterpreter-DS-6.7B.imatrix-4096.dat](https://huggingface.co/CISCai/OpenCodeInterpreter-DS-6.7B-SOTA-GGUF/blob/main/OpenCodeInterpreter-DS-6.7B.imatrix-4096.dat)
 **Note**: the above RAM figures assume no GPU offloading with 4K context. If layers are offloaded to the GPU, this will reduce RAM usage and use VRAM instead.
 <!-- README_GGUF.md-provided-files end -->

 This repo contains State Of The Art quantized GGUF format model files for [OpenCodeInterpreter DS 6.7B](https://huggingface.co/m-a-p/OpenCodeInterpreter-DS-6.7B).
+Quantization was done with an importance matrix that was trained for ~1M tokens (256 batches of 4096 tokens) of answers from the [CodeFeedback-Filtered-Instruction](https://huggingface.co/datasets/m-a-p/CodeFeedback-Filtered-Instruction) dataset.
+Everything has been reconverted and quantized with a new importance matrix using llama.cpp from April 29th 2024 onwards, as of commit [f4ab2a4](https://github.com/ggerganov/llama.cpp/commit/f4ab2a41476600a98067a9474ea8f9e6db41bcfa) to ensure correct pre-tokenization. The new GGUFs will work with older llama.cpp, but this may not generate correct prompt tokens, please use a recent build to ensure the best possible results!
 <!-- description end -->
 The new methods available are:
 * GGML_TYPE_IQ1_S - 1-bit quantization in super-blocks with an importance matrix applied, effectively using 1.56 bits per weight (bpw)
+* GGML_TYPE_IQ1_M - 1-bit quantization in super-blocks with an importance matrix applied, effectively using 1.75 bpw
 * GGML_TYPE_IQ2_XXS - 2-bit quantization in super-blocks with an importance matrix applied, effectively using 2.06 bpw
 * GGML_TYPE_IQ2_XS - 2-bit quantization in super-blocks with an importance matrix applied, effectively using 2.31 bpw
 * GGML_TYPE_IQ2_S - 2-bit quantization in super-blocks with an importance matrix applied, effectively using 2.5 bpw
 * GGML_TYPE_IQ3_S - 3-bit quantization in super-blocks with an importance matrix applied, effectively using 3.44 bpw
 * GGML_TYPE_IQ3_M - 3-bit quantization in super-blocks with an importance matrix applied, effectively using 3.66 bpw
 * GGML_TYPE_IQ4_XS - 4-bit quantization in super-blocks with an importance matrix applied, effectively using 4.25 bpw
+* GGML_TYPE_IQ4_NL - 4-bit non-linearly mapped quantization with an importance matrix applied, effectively using 4.5 bpw
 Refer to the Provided Files table below to see what files use which methods, and how.
 </details>
 | Name | Quant method | Bits | Size | Max RAM required | Use case |
 | ---- | ---- | ---- | ---- | ---- | ----- |
+| [OpenCodeInterpreter-DS-6.7B.IQ1_S.gguf](https://huggingface.co/CISCai/OpenCodeInterpreter-DS-6.7B-SOTA-GGUF/blob/main/OpenCodeInterpreter-DS-6.7B.IQ1_S.gguf) | IQ1_S | 1 | 1.5 GB| 3.5 GB | smallest, significant quality loss - **TBD**: Waiting for [this issue](https://github.com/ggerganov/llama.cpp/issues/5996) to be resolved |
 | [OpenCodeInterpreter-DS-6.7B.IQ2_XXS.gguf](https://huggingface.co/CISCai/OpenCodeInterpreter-DS-6.7B-SOTA-GGUF/blob/main/OpenCodeInterpreter-DS-6.7B.IQ2_XXS.gguf) | IQ2_XXS | 2 | 1.8 GB| 3.8 GB | very small, high quality loss |
 | [OpenCodeInterpreter-DS-6.7B.IQ2_XS.gguf](https://huggingface.co/CISCai/OpenCodeInterpreter-DS-6.7B-SOTA-GGUF/blob/main/OpenCodeInterpreter-DS-6.7B.IQ2_XS.gguf) | IQ2_XS | 2 | 1.9 GB| 3.9 GB | very small, high quality loss |
 | [OpenCodeInterpreter-DS-6.7B.IQ2_S.gguf](https://huggingface.co/CISCai/OpenCodeInterpreter-DS-6.7B-SOTA-GGUF/blob/main/OpenCodeInterpreter-DS-6.7B.IQ2_S.gguf) | IQ2_S | 2 | 2.1 GB| 4.1 GB | small, substantial quality loss |
 Generated importance matrix file: [OpenCodeInterpreter-DS-6.7B.imatrix.dat](https://huggingface.co/CISCai/OpenCodeInterpreter-DS-6.7B-SOTA-GGUF/blob/main/OpenCodeInterpreter-DS-6.7B.imatrix.dat)
 **Note**: the above RAM figures assume no GPU offloading with 4K context. If layers are offloaded to the GPU, this will reduce RAM usage and use VRAM instead.
 <!-- README_GGUF.md-provided-files end -->