Text Generation
GGUF
English
code
CISCai commited on
Commit
a24bb6a
1 Parent(s): d78cfeb

Upload 3 files

Browse files

Requantized IQ1_S with a 4K-context imatrix.

.gitattributes CHANGED
@@ -43,3 +43,4 @@ OpenCodeInterpreter-DS-6.7B.IQ3_M.gguf filter=lfs diff=lfs merge=lfs -text
43
  OpenCodeInterpreter-DS-6.7B.IQ3_S.gguf filter=lfs diff=lfs merge=lfs -text
44
  OpenCodeInterpreter-DS-6.7B.IQ3_XS.gguf filter=lfs diff=lfs merge=lfs -text
45
  OpenCodeInterpreter-DS-6.7B.IQ3_XXS.gguf filter=lfs diff=lfs merge=lfs -text
 
 
43
  OpenCodeInterpreter-DS-6.7B.IQ3_S.gguf filter=lfs diff=lfs merge=lfs -text
44
  OpenCodeInterpreter-DS-6.7B.IQ3_XS.gguf filter=lfs diff=lfs merge=lfs -text
45
  OpenCodeInterpreter-DS-6.7B.IQ3_XXS.gguf filter=lfs diff=lfs merge=lfs -text
46
+ OpenCodeInterpreter-DS-6.7B.imatrix-4096.dat filter=lfs diff=lfs merge=lfs -text
OpenCodeInterpreter-DS-6.7B.IQ1_S.gguf CHANGED
@@ -1,3 +1,3 @@
1
  version https://git-lfs.github.com/spec/v1
2
- oid sha256:51a36ff354738faa987d164be83c9d18cf0c1f0d9c2a68bde9cee9ebce8ed903
3
  size 1530209440
 
1
  version https://git-lfs.github.com/spec/v1
2
+ oid sha256:e617a6d520032a2a782c6105aa153ae70a8f8b0ba38fdbda67f5c3d02f143f40
3
  size 1530209440
OpenCodeInterpreter-DS-6.7B.imatrix-4096.dat ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:2512d59a4ac64213584464f6f079f9e6a604f6ef4a5efae591a9b814affad41b
3
+ size 4562142
README.md CHANGED
@@ -25,7 +25,7 @@ This repo contains State Of The Art quantized GGUF format model files for [OpenC
25
 
26
  Quantization was done with an importance matrix that was trained for ~1M tokens (2000 batches of 512 tokens) of answers from the [CodeFeedback-Filtered-Instruction](https://huggingface.co/datasets/m-a-p/CodeFeedback-Filtered-Instruction) dataset.
27
 
28
- Even though the 1-bit quantized model file "works" it is **not recommended** for normal use as it is extremely error-prone and pretty much defaults to infinite loops, you have been warned. 🧐
29
 
30
  <!-- description end -->
31
 
@@ -88,6 +88,7 @@ Refer to the Provided Files table below to see what files use which methods, and
88
  | [OpenCodeInterpreter-DS-6.7B.IQ3_M.gguf](https://huggingface.co/CISCai/OpenCodeInterpreter-DS-6.7B-SOTA-GGUF/blob/main/OpenCodeInterpreter-DS-6.7B.IQ3_M.gguf) | IQ3_M | 3 | 3.0 GB| 5.0 GB | medium, balanced quality - recommended |
89
 
90
  Generated importance matrix file: [OpenCodeInterpreter-DS-6.7B.imatrix.dat](https://huggingface.co/CISCai/OpenCodeInterpreter-DS-6.7B-SOTA-GGUF/blob/main/OpenCodeInterpreter-DS-6.7B.imatrix.dat)
 
91
 
92
  **Note**: the above RAM figures assume no GPU offloading with 4K context. If layers are offloaded to the GPU, this will reduce RAM usage and use VRAM instead.
93
 
 
25
 
26
  Quantization was done with an importance matrix that was trained for ~1M tokens (2000 batches of 512 tokens) of answers from the [CodeFeedback-Filtered-Instruction](https://huggingface.co/datasets/m-a-p/CodeFeedback-Filtered-Instruction) dataset.
27
 
28
+ Even though the 1-bit quantized model file "works" it is **not recommended** for normal use ~~as it is extremely error-prone~~, I've requantized it with a 4K-context imatrix which seems to have improved it a little bit but it still defaults to infinite loops, you have been warned. 🧐
29
 
30
  <!-- description end -->
31
 
 
88
  | [OpenCodeInterpreter-DS-6.7B.IQ3_M.gguf](https://huggingface.co/CISCai/OpenCodeInterpreter-DS-6.7B-SOTA-GGUF/blob/main/OpenCodeInterpreter-DS-6.7B.IQ3_M.gguf) | IQ3_M | 3 | 3.0 GB| 5.0 GB | medium, balanced quality - recommended |
89
 
90
  Generated importance matrix file: [OpenCodeInterpreter-DS-6.7B.imatrix.dat](https://huggingface.co/CISCai/OpenCodeInterpreter-DS-6.7B-SOTA-GGUF/blob/main/OpenCodeInterpreter-DS-6.7B.imatrix.dat)
91
+ Generated importance matrix file (4K context): [OpenCodeInterpreter-DS-6.7B.imatrix-4096.dat](https://huggingface.co/CISCai/OpenCodeInterpreter-DS-6.7B-SOTA-GGUF/blob/main/OpenCodeInterpreter-DS-6.7B.imatrix-4096.dat)
92
 
93
  **Note**: the above RAM figures assume no GPU offloading with 4K context. If layers are offloaded to the GPU, this will reduce RAM usage and use VRAM instead.
94