Upload 3 files

Requantized IQ1_S with a 4K-context imatrix.

Files changed (4) hide show

.gitattributes CHANGED Viewed

@@ -43,3 +43,4 @@ OpenCodeInterpreter-DS-6.7B.IQ3_M.gguf filter=lfs diff=lfs merge=lfs -text
 OpenCodeInterpreter-DS-6.7B.IQ3_S.gguf filter=lfs diff=lfs merge=lfs -text
 OpenCodeInterpreter-DS-6.7B.IQ3_XS.gguf filter=lfs diff=lfs merge=lfs -text
 OpenCodeInterpreter-DS-6.7B.IQ3_XXS.gguf filter=lfs diff=lfs merge=lfs -text

 OpenCodeInterpreter-DS-6.7B.IQ3_S.gguf filter=lfs diff=lfs merge=lfs -text
 OpenCodeInterpreter-DS-6.7B.IQ3_XS.gguf filter=lfs diff=lfs merge=lfs -text
 OpenCodeInterpreter-DS-6.7B.IQ3_XXS.gguf filter=lfs diff=lfs merge=lfs -text
+OpenCodeInterpreter-DS-6.7B.imatrix-4096.dat filter=lfs diff=lfs merge=lfs -text

OpenCodeInterpreter-DS-6.7B.IQ1_S.gguf CHANGED Viewed

@@ -1,3 +1,3 @@
 version https://git-lfs.github.com/spec/v1
-oid sha256:51a36ff354738faa987d164be83c9d18cf0c1f0d9c2a68bde9cee9ebce8ed903
 size 1530209440

 version https://git-lfs.github.com/spec/v1
+oid sha256:e617a6d520032a2a782c6105aa153ae70a8f8b0ba38fdbda67f5c3d02f143f40
 size 1530209440

OpenCodeInterpreter-DS-6.7B.imatrix-4096.dat ADDED Viewed

+version https://git-lfs.github.com/spec/v1
+oid sha256:2512d59a4ac64213584464f6f079f9e6a604f6ef4a5efae591a9b814affad41b
+size 4562142

README.md CHANGED Viewed

@@ -25,7 +25,7 @@ This repo contains State Of The Art quantized GGUF format model files for [OpenC
 Quantization was done with an importance matrix that was trained for ~1M tokens (2000 batches of 512 tokens) of answers from the [CodeFeedback-Filtered-Instruction](https://huggingface.co/datasets/m-a-p/CodeFeedback-Filtered-Instruction) dataset.
-Even though the 1-bit quantized model file "works" it is **not recommended** for normal use as it is extremely error-prone and pretty much defaults to infinite loops, you have been warned. 🧐
 <!-- description end -->
@@ -88,6 +88,7 @@ Refer to the Provided Files table below to see what files use which methods, and
 | [OpenCodeInterpreter-DS-6.7B.IQ3_M.gguf](https://huggingface.co/CISCai/OpenCodeInterpreter-DS-6.7B-SOTA-GGUF/blob/main/OpenCodeInterpreter-DS-6.7B.IQ3_M.gguf) | IQ3_M | 3 | 3.0 GB| 5.0 GB | medium, balanced quality - recommended |
 Generated importance matrix file: [OpenCodeInterpreter-DS-6.7B.imatrix.dat](https://huggingface.co/CISCai/OpenCodeInterpreter-DS-6.7B-SOTA-GGUF/blob/main/OpenCodeInterpreter-DS-6.7B.imatrix.dat)
 **Note**: the above RAM figures assume no GPU offloading with 4K context. If layers are offloaded to the GPU, this will reduce RAM usage and use VRAM instead.

 Quantization was done with an importance matrix that was trained for ~1M tokens (2000 batches of 512 tokens) of answers from the [CodeFeedback-Filtered-Instruction](https://huggingface.co/datasets/m-a-p/CodeFeedback-Filtered-Instruction) dataset.
+Even though the 1-bit quantized model file "works" it is **not recommended** for normal use ~~as it is extremely error-prone~~, I've requantized it with a 4K-context imatrix which seems to have improved it a little bit but it still defaults to infinite loops, you have been warned. 🧐
 <!-- description end -->
 | [OpenCodeInterpreter-DS-6.7B.IQ3_M.gguf](https://huggingface.co/CISCai/OpenCodeInterpreter-DS-6.7B-SOTA-GGUF/blob/main/OpenCodeInterpreter-DS-6.7B.IQ3_M.gguf) | IQ3_M | 3 | 3.0 GB| 5.0 GB | medium, balanced quality - recommended |
 Generated importance matrix file: [OpenCodeInterpreter-DS-6.7B.imatrix.dat](https://huggingface.co/CISCai/OpenCodeInterpreter-DS-6.7B-SOTA-GGUF/blob/main/OpenCodeInterpreter-DS-6.7B.imatrix.dat)
+Generated importance matrix file (4K context): [OpenCodeInterpreter-DS-6.7B.imatrix-4096.dat](https://huggingface.co/CISCai/OpenCodeInterpreter-DS-6.7B-SOTA-GGUF/blob/main/OpenCodeInterpreter-DS-6.7B.imatrix-4096.dat)
 **Note**: the above RAM figures assume no GPU offloading with 4K context. If layers are offloaded to the GPU, this will reduce RAM usage and use VRAM instead.